[jira] [Updated] (CONNECTORS-1298) Housekeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Avdeev updated CONNECTORS-1298: -- Summary: Housekeeping: jetty webapp temp directories don't get removed upon shutdown (was: Houskeeping: jetty webapp temp directories don't get removed upon shutdown) > Housekeeping: jetty webapp temp directories don't get removed upon shutdown > --- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861643#comment-15861643 ] Konstantin Avdeev commented on CONNECTORS-1298: --- I dont think, it's a jetty issue, it looks like we having here the case described at the bottom of [Temporary Directories|http://www.eclipse.org/jetty/documentation/current/ref-temporary-directories.html] > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861629#comment-15861629 ] Konstantin Avdeev commented on CONNECTORS-1298: --- These signals get sent sequentially, so, Ctrl-C should be the only one. And again, the same issue with {{Ctrl-C}} (aka {{^C}}) when starting from a cmd window. > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861595#comment-15861595 ] Konstantin Avdeev edited comment on CONNECTORS-1298 at 2/10/17 5:55 PM: Ctrl-C or, when it is running as a [NSSM|http://nssm.cc/usage#shutdown] service , it generates Ctrl-C as well: !http://nssm.cc/images/install_shutdown.png! P.S. my timeout is set to 5sec was (Author: kavdeev): Ctrl-C or, when it is running as a [NSSM|http://nssm.cc/usage] service , it generates Ctrl-C as well: !http://nssm.cc/images/install_shutdown.png! P.S. my timeout is set to 5sec > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861595#comment-15861595 ] Konstantin Avdeev commented on CONNECTORS-1298: --- Ctrl-C or, when it is running as a [NSSM|http://nssm.cc/usage] service , it generates Ctrl-C as well: !http://nssm.cc/images/install_shutdown.png! P.S. my timeout is set to 5sec > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861488#comment-15861488 ] Konstantin Avdeev commented on CONNECTORS-1298: --- Sorry, cant get it... If they get re-used, then why jetty deploys them again and again? > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861483#comment-15861483 ] Konstantin Avdeev commented on CONNECTORS-1298: --- Every MCF restart deploys 3 jetty contexts, they are left around in any case (clean shutdown or killed): !mcf-temp-folders.jpg! > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Avdeev updated CONNECTORS-1298: -- Attachment: mcf-temp-folders.jpg > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Assignee: Furkan KAMACI >Priority: Minor > Attachments: mcf-temp-folders.jpg > > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571579#comment-15571579 ] Konstantin Avdeev edited comment on CONNECTORS-1325 at 10/13/16 10:55 AM: -- The company does not use emojis :) It was just an example how to reproduce the issue. The original problem was with the "record separator" char, which is used in some libraries. BTW: & #30 - (decimal) - not a valid XML char too: http://www.w3schools.com/xml/xml_validator.asp was (Author: kavdeev): The company does not use emojis :) It was just an example how to reproduce the issue. The original problem was with the "record separator" char, which is used in some libraries. > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch, mcf-bad-ms-char.xml > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571573#comment-15571573 ] Konstantin Avdeev commented on CONNECTORS-1325: --- hex format is not a valid XML: http://www.w3schools.com/xml/xml_validator.asp the decimal format has no issues. Thanks! > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch, mcf-bad-ms-char.xml > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571456#comment-15571456 ] Konstantin Avdeev commented on CONNECTORS-1325: --- An important update! I tested the "bad" char again by looking into the network traffic (http wire = DEBUG), to make sure what exactly comes from Sharpoint: and it turned out, that this emoji char gets translated into a "wrong" format on MCF side: & # 128512; ---> & # xD83D;& # xDE00; {code} DEBUG 2016-10-13 11:39:45,460 (Thread-2572) - http-outgoing-100 << "#' ows__ModerationStatus='0' ows__Level='1' ows_Title='Task emoji ' ows_UniqueId='5;#{8F6DF977-9814-4AA0-B7AE-E29838C508CF}' ows_owshiddenversion='3' ows_FSObjType='5;#0' ows_PermMask='0x7fff' ows_FileRef='5;#sites/test-team/Lists/Main Task List/5_.000' />[\r][\n]" ... DEBUG 2016-10-13 11:39:45,461 (Worker thread '45') - SharePoint: getListItems FileRef value 'sites/test-team/Lists/Main Task List/5_.000', xml response: 'http://schemas.microsoft.com/sharepoint/soap/;> ' DEBUG 2016-10-13 11:39:45,494 (Worker thread '45') - SharePoint: Can't get version of '/Main Task List///5_.000' because of bad XML characters(?) {code} and the code & #128512 is a valid XML 1.0 code! Could you please take a look at the parser? Thank you! > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch, mcf-bad-ms-char.xml > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571237#comment-15571237 ] Konstantin Avdeev edited comment on CONNECTORS-1325 at 10/13/16 8:43 AM: - The stackoverflow's thread you mentioned in the second message here, describes the problem quite well: this character encoding was introduces in XML 1.1: https://www.w3.org/TR/xml11/#sec-xml11 and a possible solution is: setting the correct header: {code}{code} I'm afraid, it would take ages to get this fixed by MS. P.S. the correct XML prologue wont help with emojis, but at least it would solve the issue with our "record separator" :) To be honest, I'm not sure what we could do here, I'm not a fan of workarounds. We could leave it as it is now, but could you probably change the "bad character" warnings to WARN level? Currently they are shown in DEBUG only, which could be misleading in a production environment. Thanks! was (Author: kavdeev): The stackoverflow's thread you mentioned in the second message here, describes the problem quite well: this character encoding was introduces in XML 1.1: https://www.w3.org/TR/xml11/#sec-xml11 and the solution is: setting the correct header: {code}{code} I'm afraid, it would take ages to get this fixed by MS. > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch, mcf-bad-ms-char.xml > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569228#comment-15569228 ] Konstantin Avdeev commented on CONNECTORS-1325: --- ok, the XML response has been attached > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch, mcf-bad-ms-char.xml > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Avdeev updated CONNECTORS-1325: -- Attachment: mcf-bad-ms-char.xml Bad char in the Title field > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch, mcf-bad-ms-char.xml > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569216#comment-15569216 ] Konstantin Avdeev commented on CONNECTORS-1325: --- oops, the confluence parser turned the Title text into a readable form :) Trying again: ows_Title="Task emoji >>><<<" > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569211#comment-15569211 ] Konstantin Avdeev commented on CONNECTORS-1325: --- hi Karl, I think, the issue can be reproduced easily, by putting an emoji (e.g. ) into a field of a task list: {code} DEBUG 2016-10-12 18:32:47,521 (Worker thread '72') - SharePoint: getListItems FileRef value 'sites/test-team/Lists/Main Task List/5_.000', xml response: 'http://schemas.microsoft.com/sharepoint/soap/;> ' DEBUG 2016-10-12 18:32:47,522 (Worker thread '72') - SharePoint: Can't get version of '/Main Task List///5_.000' because of bad XML characters(?) {code} Thanks! > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567863#comment-15567863 ] Konstantin Avdeev commented on CONNECTORS-1325: --- Thank you, Karl! The patch seems to be working - we were able to complete the crawl, unfortunately all documents from that particular library contain this record separator char, so, there is no content in the index. We'd need a pre-parsing stage here ;) P.S. just a note: the complete patch is not yet integrated into v.2.5. > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551697#comment-15551697 ] Konstantin Avdeev commented on CONNECTORS-1325: --- sure: {code} DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Getting version of '/DevelopmentDocuments//Test/OP/VM2000248' DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Checking whether to include document '/DevelopmentDocuments/Test/OP/VM2000248' DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: File '/DevelopmentDocuments/Test/OP/VM2000248' exactly matched rule path '/DevelopmentDocuments/Test/OP/?' DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Including file '/DevelopmentDocuments/Test/OP/VM2000248' DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Finding metadata to include for document/item '/DevelopmentDocuments/Test/OP/VM2000248'. DEBUG 2016-10-05 14:45:56,515 (Worker thread '21') - SharePoint: In getFieldValues; fieldNames=[Ljava.lang.String;@6ddaf458, site='', docLibrary='{1C165434-6546-4955-8BF2-05D9632AD202}', docId='/DevelopmentDocuments/Test/OP/VM2000248', dspStsWorks=false DEBUG 2016-10-05 14:45:56,600 (Worker thread '21') - SharePoint: Got a remote exception getting field values for site library {1C165434-6546-4955-8BF2-05D9632AD202} document [/DevelopmentDocuments/Test/OP/VM2000248] - retrying AxisFault faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException faultSubcode: faultString: org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 15716; Character reference "" is an invalid XML character. faultActor: faultNode: faultDetail: {http://xml.apache.org/axis/}stackTrace:org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 15716; Character reference "" is an invalid XML character. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source) at org.apache.xerces.impl.XMLScanner.scanCharReferenceValue(Unknown Source) at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227) at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696) at org.apache.axis.Message.getSOAPEnvelope(Message.java:435) at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnderstandChecker.java:62) at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206) at org.apache.axis.client.Call.invokeEngine(Call.java:2784) at org.apache.axis.client.Call.invoke(Call.java:2767) at org.apache.axis.client.Call.invoke(Call.java:2443) at org.apache.axis.client.Call.invoke(Call.java:2366) at org.apache.axis.client.Call.invoke(Call.java:1812) at com.microsoft.schemas.sharepoint.soap.ListsSoapStub.getListItems(ListsSoapStub.java:1841) at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2099) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1433) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) {http://xml.apache.org/axis/}hostname:CDE32616 org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 15716; Character reference "" is an invalid XML character. at org.apache.axis.AxisFault.makeFault(AxisFault.java:101) at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:701) at
[jira] [Commented] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix
[ https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452384#comment-15452384 ] Konstantin Avdeev commented on CONNECTORS-1328: --- I can confirm, the MCF 2.5 running on Unix works with a SP repository now. The ticket can be closed, thanks! > Problem with Sharepoint repositories on Unix > > > Key: CONNECTORS-1328 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1328 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 2.4 > Environment: Red Hat (Linux), oracle java 1.8.0 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > UI cannot check status of a Sharepoint repository throwing an exception: > {code} > [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - > org.apache.jasper.JasperException: An exception occurred processing JSP page > /viewconnection.jsp at line 77 > 74: { > 75: try > 76: { > 77: connectionStatus = c.check(); > 78: } > 79: finally > 80: { > Stacktrace: > at > org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) > at java.lang.Thread.run(Thread.java:745) > Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent > Code attribute in method that is not native or abstract in class file > javax/xml/rpc/ServiceException > at > org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865) > at > org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794) > at > org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) > ... 23 more > Caused by: java.lang.ClassFormatError: Absent Code attribute in method that > is not native or abstract in class file javax/xml/rpc/ServiceException > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) >
[jira] [Commented] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix
[ https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452305#comment-15452305 ] Konstantin Avdeev commented on CONNECTORS-1328: --- I haven't tried that. But that could be an option too. Thanks! > Problem with Sharepoint repositories on Unix > > > Key: CONNECTORS-1328 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1328 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 2.4 > Environment: Red Hat (Linux), oracle java 1.8.0 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > UI cannot check status of a Sharepoint repository throwing an exception: > {code} > [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - > org.apache.jasper.JasperException: An exception occurred processing JSP page > /viewconnection.jsp at line 77 > 74: { > 75: try > 76: { > 77: connectionStatus = c.check(); > 78: } > 79: finally > 80: { > Stacktrace: > at > org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) > at java.lang.Thread.run(Thread.java:745) > Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent > Code attribute in method that is not native or abstract in class file > javax/xml/rpc/ServiceException > at > org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865) > at > org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794) > at > org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) > ... 23 more > Caused by: java.lang.ClassFormatError: Absent Code attribute in method that > is not native or abstract in class file javax/xml/rpc/ServiceException > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at
[jira] [Commented] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix
[ https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419125#comment-15419125 ] Konstantin Avdeev commented on CONNECTORS-1328: --- it gets downloaded to {{connector-common-lib/javaee-api-6.0.jar}}, not sure, which component it is depend on. maybe {{3.1.0}} > Problem with Sharepoint repositories on Unix > > > Key: CONNECTORS-1328 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1328 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 2.4 > Environment: Red Hat (Linux), oracle java 1.8.0 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > UI cannot check status of a Sharepoint repository throwing an exception: > {code} > [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - > org.apache.jasper.JasperException: An exception occurred processing JSP page > /viewconnection.jsp at line 77 > 74: { > 75: try > 76: { > 77: connectionStatus = c.check(); > 78: } > 79: finally > 80: { > Stacktrace: > at > org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) > at java.lang.Thread.run(Thread.java:745) > Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent > Code attribute in method that is not native or abstract in class file > javax/xml/rpc/ServiceException > at > org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865) > at > org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794) > at > org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) > ... 23 more > Caused by: java.lang.ClassFormatError: Absent Code attribute in method that > is not native or abstract in class file javax/xml/rpc/ServiceException > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native
[jira] [Updated] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix
[ https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Avdeev updated CONNECTORS-1328: -- Description: UI cannot check status of a Sharepoint repository throwing an exception: {code} [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - org.apache.jasper.JasperException: An exception occurred processing JSP page /viewconnection.jsp at line 77 74: { 75: try 76: { 77: connectionStatus = c.check(); 78: } 79: finally 80: { Stacktrace: at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) at java.lang.Thread.run(Thread.java:745) Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file javax/xml/rpc/ServiceException at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865) at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794) at org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) ... 23 more Caused by: java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file javax/xml/rpc/ServiceException at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getSession(SharePointRepository.java:331) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.check(SharePointRepository.java:445) at org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:233) ... 26 more {code} Upgrading {{javaee-api.jar}} to 7.0 solved the problem http://central.maven.org/maven2/javax/javaee-api/7.0/
[jira] [Created] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix
Konstantin Avdeev created CONNECTORS-1328: - Summary: Problem with Sharepoint repositories on Unix Key: CONNECTORS-1328 URL: https://issues.apache.org/jira/browse/CONNECTORS-1328 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 2.4 Environment: Red Hat (Linux), oracle java 1.8.0 Reporter: Konstantin Avdeev UI cannot check status of a Sharepoint repository throwing an exception: {code} [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - org.apache.jasper.JasperException: An exception occurred processing JSP page /viewconnection.jsp at line 77 74: { 75: try 76: { 77: connectionStatus = c.check(); 78: } 79: finally 80: { Stacktrace: at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) at java.lang.Thread.run(Thread.java:745) Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file javax/xml/rpc/ServiceException at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865) at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794) at org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) ... 23 more Caused by: java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file javax/xml/rpc/ServiceException at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getSession(SharePointRepository.java:331) at
[jira] [Commented] (CONNECTORS-1286) Solr Plugin: Add support for User Principal
[ https://issues.apache.org/jira/browse/CONNECTORS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389174#comment-15389174 ] Konstantin Avdeev commented on CONNECTORS-1286: --- hi Karl, yes, I guess, he still would like this feature gets implemented :) To be honest, this would be a really good integration with Solr - the user would be able to authenticate out of the box. Thank you! > Solr Plugin: Add support for User Principal > --- > > Key: CONNECTORS-1286 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1286 > Project: ManifoldCF > Issue Type: Improvement > Components: Solr 6.x component >Affects Versions: ManifoldCF 2.3 >Reporter: Konrad Holl >Assignee: Karl Wright >Priority: Minor > Fix For: ManifoldCF 2.5 > > > I’m using ManifoldCF 2.3 with Solr 5.4.1 and the Velocity templating engine. > I needed to do searches with ACLs enabled and installed the plugin. > Unfortunately it is not possible to use the login information provided by > Jetty in the Solr plugin. > As of Solr 5.3 it is possible to extract the authenticated user from the > SolrQueryRequest object: > http://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/request/SolrQueryRequest.html#getUserPrincipal(). > I added these lines to the code in > org.apache.solr.mcf.ManifoldCFSearchComponent before the evaluation of > parameters for authenticated user name: > {code} > String authDomain = (String)args.get("AuthDomain"); > if (rb.req.getUserPrincipal() != null) { > domainMap.put("", rb.req.getUserPrincipal().getName() + > ((authDomain == null) ? "" : "@" + authDomain)); > } > else { > // Get the authenticated user name from the parameters > {code} > I also needed an additional setting “authDomain” in the search component > configuration (solrconfig.xml). Now I can use Velocity even for documents > with ACLs :o) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1324) Not all SharePoint Metadata Fields are returned - 2
[ https://issues.apache.org/jira/browse/CONNECTORS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341971#comment-15341971 ] Konstantin Avdeev commented on CONNECTORS-1324: --- Thank you very much for the prompt feedback and for the patch! Just an idea: it would be great to have the up-to-date mapping right in the search index, so, the frontend could implement a logic to map the names back, e.g.: {{Name="internal_name" DisplayName="pretty_name"}} would produce the following meta-data: {code:javascript} "internal_name": "value", "internal_name_DisplayName": "pretty_name", {code} What do you think? Thanks again! > Not all SharePoint Metadata Fields are returned - 2 > --- > > Key: CONNECTORS-1324 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1324 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.4 > Environment: Java 1.8, Windows x64, Sharepoint 2013 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1324.patch > > > Hello Karl, > This is a follow up ticket for > [1284|https://issues.apache.org/jira/browse/CONNECTORS-1284]. > There is still a problem with getting meta-data from sharepoint lists. > E.g. I'm missing the "title" and "description" fields in the result documents. > Let's take the "title" field from a list document: > {code:xml|title=DEBUG: SharePoint: getFieldList xml response:} > ... > Name="Title" > DisplayName="Task Name" Required="TRUE" > SourceID="http://schemas.microsoft.com/sharepoint/v3; > StaticName="Title" FromBaseType="TRUE" Sealed="TRUE" ColName="nvarchar1" > /> > ... > {code} > The field {{Name}} is the internal (technical) field name, and > {{DisplayName}} is the frontend (user-friendly) name. > The connector maps {{Name}} to {{DisplayName}} when it is preparing the > request: > {code:java|title=SharePointRepository.java} > for (String field : fieldNames.keySet()) > { > String value = fieldNames.get(field); > fields[j++] = (value==null)?field:value; > {code} > Changing the last two lines to: > {code:java} > fields[j++] = field; > {code} > solves the problem. > I doubt, the {{DisplayName}} should be used at all, as it can contain > non-ascii chars, e.g.: > {code:xml} > ... > Name="Title" DisplayName="berschrift" ... > {code} > Currently, only fields with the same Name/DisplayName values can be indexed. > Could you please look into this? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1324) Not all SharePoint Metadata Fields are returned - 2
Konstantin Avdeev created CONNECTORS-1324: - Summary: Not all SharePoint Metadata Fields are returned - 2 Key: CONNECTORS-1324 URL: https://issues.apache.org/jira/browse/CONNECTORS-1324 Project: ManifoldCF Issue Type: Bug Components: SharePoint connector Affects Versions: ManifoldCF 2.4 Environment: Java 1.8, Windows x64, Sharepoint 2013 Reporter: Konstantin Avdeev Hello Karl, This is a follow up ticket for [1284|https://issues.apache.org/jira/browse/CONNECTORS-1284]. There is still a problem with getting meta-data from sharepoint lists. E.g. I'm missing the "title" and "description" fields in the result documents. Let's take the "title" field from a list document: {code:xml|title=DEBUG: SharePoint: getFieldList xml response:} ... http://schemas.microsoft.com/sharepoint/v3; StaticName="Title" FromBaseType="TRUE" Sealed="TRUE" ColName="nvarchar1" /> ... {code} The field {{Name}} is the internal (technical) field name, and {{DisplayName}} is the frontend (user-friendly) name. The connector maps {{Name}} to {{DisplayName}} when it is preparing the request: {code:java|title=SharePointRepository.java} for (String field : fieldNames.keySet()) { String value = fieldNames.get(field); fields[j++] = (value==null)?field:value; {code} Changing the last two string to: {code:java} fields[j++] = field; {code} solves the problem. I doubt, the {{DisplayName}} should be used at all, as it can contain non-ascii chars, e.g.: {code:xml} ...
[jira] [Commented] (CONNECTORS-1312) jcifs.smb.SmbException: Connection reset by peer: socket write error
[ https://issues.apache.org/jira/browse/CONNECTORS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275360#comment-15275360 ] Konstantin Avdeev commented on CONNECTORS-1312: --- yes, we are crawling the server too "hard" - there are a lot of "all pipes are busy" warnings, but the file server doesn't seem to be under heavy load. So, could we add the "connection reset by peer" to the list of non-severe exceptions? Thanks! > jcifs.smb.SmbException: Connection reset by peer: socket write error > > > Key: CONNECTORS-1312 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1312 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.5 > Environment: Windows x64, java 1.8.x >Reporter: Konstantin Avdeev > > hi Karl, > we've found another JCIFS exception: Windows share jobs stop when > encountering a "Connection reset by peer" error, e.g.: > {code} > ERROR 2016-05-03 15:29:24,209 (Worker thread '80') - JCIFS: SmbException > tossed processing smb://server.domain.com/path/file.ppt > jcifs.smb.SmbException: Connection reset by peer: socket write error > java.net.SocketException: Connection reset by peer: socket write error > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) > at java.net.SocketOutputStream.write(SocketOutputStream.java:153) > at jcifs.smb.SmbTransport.doSend(SmbTransport.java:453) > at jcifs.util.transport.Transport.sendrecv(Transport.java:67) > at jcifs.smb.SmbTransport.send(SmbTransport.java:655) > at jcifs.smb.SmbSession.send(SmbSession.java:238) > at jcifs.smb.SmbTree.send(SmbTree.java:119) > at jcifs.smb.SmbFile.send(SmbFile.java:775) > at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181) > at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at java.io.FilterInputStream.read(FilterInputStream.java:107) > at java.nio.file.Files.copy(Files.java:2908) > at java.nio.file.Files.copy(Files.java:3027) > at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587) > at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615) > at > org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:358) > at > org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:424) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) > at > org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48) > at > org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:227) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3224) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3075) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:979) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > {code} > Current workaround - to start the job again (manually or by the scheduler). > It is clear, that there are many errors, when it makes no sense to skip a > failed URL and continue the job, e.g.: > {code} > Error: SmbAuthException thrown: Logon failure: unknown user name or bad > password. > {code} > I'm thinking about a general solution, like defining a list (through the UI > or properties.xml) with non severe exceptions, like "file busy" or "symlink > detected" etc, so the admins would be able to specify, when the crawler > should stop and when it should retry, skip and go further. > What do you
[jira] [Commented] (CONNECTORS-1311) Dependencies issues
[ https://issues.apache.org/jira/browse/CONNECTORS-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271450#comment-15271450 ] Konstantin Avdeev commented on CONNECTORS-1311: --- Thank you for the quick fix for the first issue! Would you mind commenting on the others?.. > Dependencies issues > --- > > Key: CONNECTORS-1311 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1311 > Project: ManifoldCF > Issue Type: Bug > Components: Build >Affects Versions: ManifoldCF 2.5 > Environment: any >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > There are several issues with the dependencies: > 1) POI should be 3.13, since tika 1.12 uses that version. With POI 3.14 tika > cannot parse presentation files (ppt): > {code} > FATAL 2016-05-03 10:39:16,821 (Worker thread '0') - Error tossed: > org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder; > java.lang.NoSuchMethodError: > org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder; > at > org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.extractContent(XSLFPowerPointExtractorDecorator.java:154) > at > org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:88) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48) > {code} > 2) jcifs "1.3.17" is used currently. Available is "1.3.18". > 3) Java Advanced Imaging (JAI), jbig2 format libs are not included, but > required for parsing embedded images. > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1312) jcifs.smb.SmbException: Connection reset by peer: socket write error
Konstantin Avdeev created CONNECTORS-1312: - Summary: jcifs.smb.SmbException: Connection reset by peer: socket write error Key: CONNECTORS-1312 URL: https://issues.apache.org/jira/browse/CONNECTORS-1312 Project: ManifoldCF Issue Type: Bug Components: JCIFS connector Affects Versions: ManifoldCF 2.5 Environment: Windows x64, java 1.8.x Reporter: Konstantin Avdeev hi Karl, we've found another JCIFS exception: Windows share jobs stop when encountering a "Connection reset by peer" error, e.g.: {code} ERROR 2016-05-03 15:29:24,209 (Worker thread '80') - JCIFS: SmbException tossed processing smb://server.domain.com/path/file.ppt jcifs.smb.SmbException: Connection reset by peer: socket write error java.net.SocketException: Connection reset by peer: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at jcifs.smb.SmbTransport.doSend(SmbTransport.java:453) at jcifs.util.transport.Transport.sendrecv(Transport.java:67) at jcifs.smb.SmbTransport.send(SmbTransport.java:655) at jcifs.smb.SmbSession.send(SmbSession.java:238) at jcifs.smb.SmbTree.send(SmbTree.java:119) at jcifs.smb.SmbFile.send(SmbFile.java:775) at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181) at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:107) at java.nio.file.Files.copy(Files.java:2908) at java.nio.file.Files.copy(Files.java:3027) at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587) at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:358) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:424) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48) at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:227) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3224) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3075) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:979) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) {code} Current workaround - to start the job again (manually or by the scheduler). It is clear, that there are many errors, when it makes no sense to skip a failed URL and continue the job, e.g.: {code} Error: SmbAuthException thrown: Logon failure: unknown user name or bad password. {code} I'm thinking about a general solution, like defining a list (through the UI or properties.xml) with non severe exceptions, like "file busy" or "symlink detected" etc, so the admins would be able to specify, when the crawler should stop and when it should retry, skip and go further. What do you think? Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1311) Dependencies issues
Konstantin Avdeev created CONNECTORS-1311: - Summary: Dependencies issues Key: CONNECTORS-1311 URL: https://issues.apache.org/jira/browse/CONNECTORS-1311 Project: ManifoldCF Issue Type: Bug Components: Build Affects Versions: ManifoldCF 2.5 Environment: any Reporter: Konstantin Avdeev There are several issues with the dependencies: 1) POI should be 3.13, since tika 1.12 uses that version. With POI 3.14 tika cannot parse presentation files (ppt): {code} FATAL 2016-05-03 10:39:16,821 (Worker thread '0') - Error tossed: org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder; java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder; at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.extractContent(XSLFPowerPointExtractorDecorator.java:154) at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:88) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48) {code} 2) jcifs "1.3.17" is used currently. Available is "1.3.18". 3) Java Advanced Imaging (JAI), jbig2 format libs are not included, but required for parsing embedded images. Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205
[ https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265421#comment-15265421 ] Konstantin Avdeev commented on CONNECTORS-1305: --- Seems to be working! before: {code} ERROR 2016-04-30 20:35:17,664 (Worker thread '11') - JCIFS: SmbException tossed processing smb://localhost/share/longDir-0/longDir-1/longDir-2/longDir-3/longDir-4/longDir-5/longDir-6/longDir-7/longDir-8/longDir-9/longDir-10/longDir-11/longDir-12/longDir-13/longDir-14/longDir-15/longDir-16/longDir-17/longDir-18/longDir-19/longDir-20/longDir-21/longDir-22/longDir-23/longDir-24/longDir-25/longDir-26/longDir-27/longDir-28/longDir-29/longDir-30/longDir-31/longDir-32/longDir-33/longDir-34/longDir-35/longDir-36/longDir-37/longDir-38/longDir-39/longDir-40/longDir-41/longDir-42/longDir-43/longDir-44/longDir-45/ jcifs.smb.SmbException: 0xC205 INFO 2016-04-30 20:35:17,671 (Worker thread '11') - Aborting job 1460647853267 due to error 'SmbException tossed: 0xC205' {code} after: {code} WARN 2016-04-30 20:38:45,769 (Worker thread '86') - JCIFS: Out of resources exception reading document/directory smb://localhost/share/longDir-0/longDir-1/longDir-2/longDir-3/longDir-4/longDir-5/longDir-6/longDir-7/longDir-8/longDir-9/longDir-10/longDir-11/longDir-12/longDir-13/longDir-14/longDir-15/longDir-16/longDir-17/longDir-18/longDir-19/longDir-20/longDir-21/longDir-22/longDir-23/longDir-24/longDir-25/longDir-26/longDir-27/longDir-28/longDir-29/longDir-30/longDir-31/longDir-32/longDir-33/longDir-34/longDir-35/longDir-36/longDir-37/longDir-38/longDir-39/longDir-40/longDir-41/longDir-42/longDir-43/longDir-44/longDir-45/ - skipping {code} Thank you for the patch! > Windows Share connector: SmbException tossed: 0xC205 > > > Key: CONNECTORS-1305 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1305 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.4 > Environment: Windows server 2012 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1305.patch > > > Windows share jobs stop when encountering an [Insufficient server resources > exist to complete the > request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply > (0xC205 - STATUS_INSUFF_SERVER_RESOURCES). > Is it possible to catch that exception as well? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1307) Tika extractor infinite loop on error
[ https://issues.apache.org/jira/browse/CONNECTORS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264108#comment-15264108 ] Konstantin Avdeev commented on CONNECTORS-1307: --- Tika 1.12 solved the StackOverflowError. But, as you say, it has a lot of dependencies, and they have changed! So, I have to figure out, which libs needs to be updated as well... Thank you for you help as usual! > Tika extractor infinite loop on error > - > > Key: CONNECTORS-1307 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1307 > Project: ManifoldCF > Issue Type: Bug > Components: Tika extractor >Affects Versions: ManifoldCF 2.4 > Environment: windows 64bit, java version "1.8.0_77", > pdfbox-1.8.10.jar, tika-parsers-1.10.jar >Reporter: Konstantin Avdeev > > The Tika extractor gets stuck (is trying to parse the same document again and > again) on the following error: > {code} > FATAL 2016-04-29 10:55:45,505 (Worker thread '41') - Error tossed: null > java.lang.StackOverflowError > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:296) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:348) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > {code} > -Xss - is the default one, which is, I believe, 512k. > We can increase the stack trace size, but I think, this error should not lead > to such situation. > Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205
[ https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263950#comment-15263950 ] Konstantin Avdeev commented on CONNECTORS-1305: --- It seems to be a very long path, created by a deployment script or something like that. I can't even navigate deeper using the windows file explorer - it just gets stuck. When the server replies with the status "insufficient resource", it'd make sense to skip that URL and re-check it next time (of course, it will never work for that particular case), but not to cancel the whole job. That's only my opinion. Thanks! > Windows Share connector: SmbException tossed: 0xC205 > > > Key: CONNECTORS-1305 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1305 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.4 > Environment: Windows server 2012 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > Windows share jobs stop when encountering an [Insufficient server resources > exist to complete the > request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply > (0xC205 - STATUS_INSUFF_SERVER_RESOURCES). > Is it possible to catch that exception as well? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1307) Tika extractor infinite loop on error
[ https://issues.apache.org/jira/browse/CONNECTORS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263938#comment-15263938 ] Konstantin Avdeev commented on CONNECTORS-1307: --- Fair enough :) is it a worth of trying to replace the tika and pdfbox libraries by the latest ones? Thanks! > Tika extractor infinite loop on error > - > > Key: CONNECTORS-1307 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1307 > Project: ManifoldCF > Issue Type: Bug > Components: Tika extractor >Affects Versions: ManifoldCF 2.4 > Environment: windows 64bit, java version "1.8.0_77", > pdfbox-1.8.10.jar, tika-parsers-1.10.jar >Reporter: Konstantin Avdeev > > The Tika extractor gets stuck (is trying to parse the same document again and > again) on the following error: > {code} > FATAL 2016-04-29 10:55:45,505 (Worker thread '41') - Error tossed: null > java.lang.StackOverflowError > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:296) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:348) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) > {code} > -Xss - is the default one, which is, I believe, 512k. > We can increase the stack trace size, but I think, this error should not lead > to such situation. > Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1307) Tika extractor infinite loop on error
Konstantin Avdeev created CONNECTORS-1307: - Summary: Tika extractor infinite loop on error Key: CONNECTORS-1307 URL: https://issues.apache.org/jira/browse/CONNECTORS-1307 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.4 Environment: windows 64bit, java version "1.8.0_77", pdfbox-1.8.10.jar, tika-parsers-1.10.jar Reporter: Konstantin Avdeev The Tika extractor gets stuck (is trying to parse the same document again and again) on the following error: {code} FATAL 2016-04-29 10:55:45,505 (Worker thread '41') - Error tossed: null java.lang.StackOverflowError at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:296) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:348) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) {code} -Xss - is the default one, which is, I believe, 512k. We can increase the stack trace size, but I think, this error should not lead to such situation. Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205
[ https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262601#comment-15262601 ] Konstantin Avdeev commented on CONNECTORS-1305: --- FYI: I don't use DFS paths, the connector crawls the servers directly. > Windows Share connector: SmbException tossed: 0xC205 > > > Key: CONNECTORS-1305 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1305 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.4 > Environment: Windows server 2012 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > Windows share jobs stop when encountering an [Insufficient server resources > exist to complete the > request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply > (0xC205 - STATUS_INSUFF_SERVER_RESOURCES). > Is it possible to catch that exception as well? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205
[ https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262516#comment-15262516 ] Konstantin Avdeev edited comment on CONNECTORS-1305 at 4/28/16 4:59 PM: I think, it's a permanent one. Fortunately, I've got the full stack trace for that: it turned out, that the problem is a very long path: {code} ERROR 2016-04-28 17:34:43,332 (Worker thread '89') - JCIFS: SmbException tossed processing smb://server.domain.com/Share/dir1/dir2/dir3/dir4/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/.metadata/.plugins/com.collabnet.subversion.merge/ jcifs.smb.SmbException: 0xC205 {code} The connector tried to get there four times, permanently getting the error message from the server, and canceled the job. Basically, it is not able to complete the job because of the weird path. So, such kind of exception should be a WARN not an ERROR. Thanks! P.S. I've added _dir4/repeating-dir_ to the exclude list, hopefully, it'd skip that bad dir on the next run. was (Author: kavdeev): I think, it's a permanent one. Fortunately, I've got the full stack trace for that: it turned out, that the problem is a very long path: ERROR 2016-04-28 17:34:43,332 (Worker thread '89') - JCIFS: SmbException tossed processing smb://server.domain.com/Share/dir1/dir2/dir3/dir4/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/.metadata/.plugins/com.collabnet.subversion.merge/ jcifs.smb.SmbException: 0xC205 The connector tried to get there four times, permanently getting the error message from the server, and canceled the job. Basically, it is not able to complete the job because of the weird path. So, such kind of exception should be a WARN not an ERROR. Thanks! > Windows Share connector: SmbException tossed: 0xC205 > > > Key: CONNECTORS-1305 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1305 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.4 > Environment: Windows server 2012 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > Windows share jobs stop when encountering an [Insufficient server resources > exist to complete the > request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply > (0xC205 - STATUS_INSUFF_SERVER_RESOURCES). > Is it possible to catch that exception as well? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205
[ https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262516#comment-15262516 ] Konstantin Avdeev commented on CONNECTORS-1305: --- I think, it's a permanent one. Fortunately, I've got the full stack trace for that: it turned out, that the problem is a very long path: ERROR 2016-04-28 17:34:43,332 (Worker thread '89') - JCIFS: SmbException tossed processing smb://server.domain.com/Share/dir1/dir2/dir3/dir4/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/.metadata/.plugins/com.collabnet.subversion.merge/ jcifs.smb.SmbException: 0xC205 The connector tried to get there four times, permanently getting the error message from the server, and canceled the job. Basically, it is not able to complete the job because of the weird path. So, such kind of exception should be a WARN not an ERROR. Thanks! > Windows Share connector: SmbException tossed: 0xC205 > > > Key: CONNECTORS-1305 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1305 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.4 > Environment: Windows server 2012 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.5 > > > Windows share jobs stop when encountering an [Insufficient server resources > exist to complete the > request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply > (0xC205 - STATUS_INSUFF_SERVER_RESOURCES). > Is it possible to catch that exception as well? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205
Konstantin Avdeev created CONNECTORS-1305: - Summary: Windows Share connector: SmbException tossed: 0xC205 Key: CONNECTORS-1305 URL: https://issues.apache.org/jira/browse/CONNECTORS-1305 Project: ManifoldCF Issue Type: Bug Components: JCIFS connector Affects Versions: ManifoldCF 2.4 Environment: Windows server 2012 Reporter: Konstantin Avdeev Windows share jobs stop when encountering an [Insufficient server resources exist to complete the request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply (0xC205 - STATUS_INSUFF_SERVER_RESOURCES). Is it possible to catch that exception as well? Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?
[ https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244745#comment-15244745 ] Konstantin Avdeev commented on CONNECTORS-1299: --- BTW - the current trunk version behaves exactly in the same way. > "Seeding" phase of a job prevents starting others? > -- > > Key: CONNECTORS-1299 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1299 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent > Environment: Windows >Reporter: Konstantin Avdeev > > Hello Karl, could you please clarify if this is a bug or a feature? :) > When I start an smb job for a share containing a lot of files (can be > reproduced with a \Windows directory :)) and then start a second job, the > last one remains some time (depends on amount of data processing by the first > one) with the status "running", but showing {{"Active=1"}} and does not > progress. > Setting log level to Debug did not shed a light on this, unfortunately. > It would be great, if could elaborate on that a little! > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?
[ https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244744#comment-15244744 ] Konstantin Avdeev commented on CONNECTORS-1299: --- Thanks a lot for the explanation! Just curious - is it also described in the book: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/ ? If yes, I'm going to read this :) -- best regards, KA > "Seeding" phase of a job prevents starting others? > -- > > Key: CONNECTORS-1299 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1299 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent > Environment: Windows >Reporter: Konstantin Avdeev > > Hello Karl, could you please clarify if this is a bug or a feature? :) > When I start an smb job for a share containing a lot of files (can be > reproduced with a \Windows directory :)) and then start a second job, the > last one remains some time (depends on amount of data processing by the first > one) with the status "running", but showing {{"Active=1"}} and does not > progress. > Setting log level to Debug did not shed a light on this, unfortunately. > It would be great, if could elaborate on that a little! > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1297) Windows Share job stops upon a symbolic link
[ https://issues.apache.org/jira/browse/CONNECTORS-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244743#comment-15244743 ] Konstantin Avdeev commented on CONNECTORS-1297: --- Great support again, Thank you very much! {code} INFO 2016-04-17 19:11:58,344 (Startup thread) - Marked job 1460829266713 for startup INFO 2016-04-17 19:12:15,642 (Startup thread) - Job 1460829266713 is now started WARN 2016-04-17 19:15:19,709 (Worker thread '74') - JCIFS: Symlink detected: smb://localhost/C$/Program Files/Java/jdk/ DEBUG 2016-04-17 19:15:42,943 (Idle cleanup thread) - Connection manager is shutting down DEBUG 2016-04-17 19:15:42,943 (Idle cleanup thread) - http-outgoing-1: Close connection DEBUG 2016-04-17 19:15:42,943 (Idle cleanup thread) - Connection manager shut down {code} > Windows Share job stops upon a symbolic link > > > Key: CONNECTORS-1297 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1297 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.3 > Environment: Windows 2012, Windows 10 >Reporter: Konstantin Avdeev > Attachments: CONNECTORS-1297.patch > > > Windows shares having a symbolic link cannot be crawled: the job stop with > the exception: > {code} > Error: SmbException tossed: 0x802D > {code} > Stack trace example: > {code} > ERROR 2016-04-16 20:01:40,384 (Worker thread '76') - JCIFS: SmbException > tossed processing smb://localhost/C$/Program Files/Java/jdk/ > jcifs.smb.SmbException: 0x802D > at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) > at jcifs.smb.SmbTransport.send(SmbTransport.java:640) > at jcifs.smb.SmbSession.send(SmbSession.java:238) > at jcifs.smb.SmbTree.send(SmbTree.java:119) > at jcifs.smb.SmbFile.send(SmbFile.java:775) > at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989) > at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741) > at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718) > at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707) > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2295) > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:788) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > {code} > jdk is a symbolic directory link > {code} > 02.03.2016 12:38 jdk [jdk1.8.0] > {code} > Expected behaviour: treat a link as an usual directory/file. Or at least, > skip it and continue the job. > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1297) Windows Share job stops upon a symbolic link
[ https://issues.apache.org/jira/browse/CONNECTORS-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244536#comment-15244536 ] Konstantin Avdeev commented on CONNECTORS-1297: --- ok, make sense, but in the meantime, couldn't we just catch that particular exception in order to process the job further? Thanks! > Windows Share job stops upon a symbolic link > > > Key: CONNECTORS-1297 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1297 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.3 > Environment: Windows 2012, Windows 10 >Reporter: Konstantin Avdeev > > Windows shares having a symbolic link cannot be crawled: the job stop with > the exception: > {code} > Error: SmbException tossed: 0x802D > {code} > Stack trace example: > {code} > ERROR 2016-04-16 20:01:40,384 (Worker thread '76') - JCIFS: SmbException > tossed processing smb://localhost/C$/Program Files/Java/jdk/ > jcifs.smb.SmbException: 0x802D > at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) > at jcifs.smb.SmbTransport.send(SmbTransport.java:640) > at jcifs.smb.SmbSession.send(SmbSession.java:238) > at jcifs.smb.SmbTree.send(SmbTree.java:119) > at jcifs.smb.SmbFile.send(SmbFile.java:775) > at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989) > at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741) > at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718) > at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707) > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2295) > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:788) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > {code} > jdk is a symbolic directory link > {code} > 02.03.2016 12:38 jdk [jdk1.8.0] > {code} > Expected behaviour: treat a link as an usual directory/file. Or at least, > skip it and continue the job. > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
[ https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Avdeev updated CONNECTORS-1298: -- Description: Every MCF restart leaves out webapp temp dirs like this: {code} C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ {code} (or under {{java.io.tmpdir}}, if set) Expected behaviour: delete these dir upon exit. Could it help to set jetty's {{persistTempDirectory}} to false for these contextes? Thank you! was: Every MCF restart leaves out webapp temp dirs like this: {code} C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ {code} (or under {{java.io.tmpdir}}, if set) Expected behaviour: delete these dir upon exit. Could it help to set jetty's {{persistTempDirectory}} to true for these contextes? Thank you! > Houskeeping: jetty webapp temp directories don't get removed upon shutdown > -- > > Key: CONNECTORS-1298 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core > Environment: Windows >Reporter: Konstantin Avdeev >Priority: Minor > > Every MCF restart leaves out webapp temp dirs like this: > {code} > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ > C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ > {code} > (or under {{java.io.tmpdir}}, if set) > Expected behaviour: delete these dir upon exit. > Could it help to set jetty's {{persistTempDirectory}} to false for these > contextes? > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?
[ https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Avdeev updated CONNECTORS-1299: -- Description: Hello Karl, could you please clarify if this is a bug or a feature? :) When I start an smb job for a share containing a lot of files (can be reproduced with a \Windows directory :)) and then start a second job, the last one remains some time (depends on amount of data processing by the first one) with the status "running", but showing {{"Active=1"}} and does not progress. Setting log level to Debug did not shed a light on this, unfortunately. It would be great, if could elaborate on that a little! Thank you! was: Hello Karl, could you please clarify if this is a bug or a feature? :) When I start an smb job for a share containing a lot of files (can be reproduced with a \Windows directory :)) and then start a second job, the last one remains some time (depends on amount of data processing by the first one) with the status "running", but showing {{"Active=1"}} and does not progress. Setting Debug=true did not shed a light on this, unfortunately. It would be great, if could elaborate on that a little! Thank you! > "Seeding" phase of a job prevents starting others? > -- > > Key: CONNECTORS-1299 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1299 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent > Environment: Windows >Reporter: Konstantin Avdeev > > Hello Karl, could you please clarify if this is a bug or a feature? :) > When I start an smb job for a share containing a lot of files (can be > reproduced with a \Windows directory :)) and then start a second job, the > last one remains some time (depends on amount of data processing by the first > one) with the status "running", but showing {{"Active=1"}} and does not > progress. > Setting log level to Debug did not shed a light on this, unfortunately. > It would be great, if could elaborate on that a little! > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?
[ https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Avdeev updated CONNECTORS-1299: -- Summary: "Seeding" phase of a job prevents starting others? (was: "Seeding" phase of a job prevent starting others?) > "Seeding" phase of a job prevents starting others? > -- > > Key: CONNECTORS-1299 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1299 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent > Environment: Windows >Reporter: Konstantin Avdeev > > Hello Karl, could you please clarify if this is a bug or a feature? :) > When I start an smb job for a share containing a lot of files (can be > reproduced with a \Windows directory :)) and then start a second job, the > last one remains some time (depends on amount of data processing by the first > one) with the status "running", but showing {{"Active=1"}} and does not > progress. > Setting Debug=true did not shed a light on this, unfortunately. > It would be great, if could elaborate on that a little! > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1299) "Seeding" phase of a job prevent starting others?
Konstantin Avdeev created CONNECTORS-1299: - Summary: "Seeding" phase of a job prevent starting others? Key: CONNECTORS-1299 URL: https://issues.apache.org/jira/browse/CONNECTORS-1299 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Environment: Windows Reporter: Konstantin Avdeev Hello Karl, could you please clarify if this is a bug or a feature? :) When I start an smb job for a share containing a lot of files (can be reproduced with a \Windows directory :)) and then start a second job, the last one remains some time (depends on amount of data processing by the first one) with the status "running", but showing {{"Active=1"}} and does not progress. Setting Debug=true did not shed a light on this, unfortunately. It would be great, if could elaborate on that a little! Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown
Konstantin Avdeev created CONNECTORS-1298: - Summary: Houskeeping: jetty webapp temp directories don't get removed upon shutdown Key: CONNECTORS-1298 URL: https://issues.apache.org/jira/browse/CONNECTORS-1298 Project: ManifoldCF Issue Type: Bug Components: Framework core Environment: Windows Reporter: Konstantin Avdeev Priority: Minor Every MCF restart leaves out webapp temp dirs like this: {code} C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/ C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/ C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/ {code} (or under {{java.io.tmpdir}}, if set) Expected behaviour: delete these dir upon exit. Could it help to set jetty's {{persistTempDirectory}} to true for these contextes? Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1297) Windows Share job stops upon a symbolic link
Konstantin Avdeev created CONNECTORS-1297: - Summary: Windows Share job stops upon a symbolic link Key: CONNECTORS-1297 URL: https://issues.apache.org/jira/browse/CONNECTORS-1297 Project: ManifoldCF Issue Type: Bug Components: JCIFS connector Affects Versions: ManifoldCF 2.3 Environment: Windows 2012, Windows 10 Reporter: Konstantin Avdeev Windows shares having a symbolic link cannot be crawled: the job stop with the exception: {code} Error: SmbException tossed: 0x802D {code} Stack trace example: {code} ERROR 2016-04-16 20:01:40,384 (Worker thread '76') - JCIFS: SmbException tossed processing smb://localhost/C$/Program Files/Java/jdk/ jcifs.smb.SmbException: 0x802D at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) at jcifs.smb.SmbTransport.send(SmbTransport.java:640) at jcifs.smb.SmbSession.send(SmbSession.java:238) at jcifs.smb.SmbTree.send(SmbTree.java:119) at jcifs.smb.SmbFile.send(SmbFile.java:775) at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989) at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741) at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718) at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2295) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:788) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) {code} jdk is a symbolic directory link {code} 02.03.2016 12:38 jdk [jdk1.8.0] {code} Expected behaviour: treat a link as an usual directory/file. Or at least, skip it and continue the job. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored
[ https://issues.apache.org/jira/browse/CONNECTORS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241022#comment-15241022 ] Konstantin Avdeev commented on CONNECTORS-1295: --- Great support! Thank you! :) > Windows Share Connector's job: Maximum document length parameter is ignored > --- > > Key: CONNECTORS-1295 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1295 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.3 > Environment: Windows Server 2012 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.4 > > Attachments: CONNECTORS-1295.patch > > > It seems, the windows share jobs ignore the "Maximum document length" > parameter and download documents of any length, e.g.: > Edit job -> Content Length -> Maximum document length: > 50485760 > And from from the history output: > {code} > 04-14-2016 10:52:32.813 access > smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif > code:OK bytes:406334712 time:134843 > {code} > Any ideas, why this huge file was not rejected? > Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored
[ https://issues.apache.org/jira/browse/CONNECTORS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241008#comment-15241008 ] Konstantin Avdeev commented on CONNECTORS-1295: --- one more double check from the "jobs" table: {{select description,docspec from jobs}} {code:xml} {code} > Windows Share Connector's job: Maximum document length parameter is ignored > --- > > Key: CONNECTORS-1295 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1295 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.3 > Environment: Windows Server 2012 >Reporter: Konstantin Avdeev >Assignee: Karl Wright > Fix For: ManifoldCF 2.4 > > Attachments: CONNECTORS-1295.patch > > > It seems, the windows share jobs ignore the "Maximum document length" > parameter and download documents of any length, e.g.: > Edit job -> Content Length -> Maximum document length: > 50485760 > And from from the history output: > {code} > 04-14-2016 10:52:32.813 access > smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif > code:OK bytes:406334712 time:134843 > {code} > Any ideas, why this huge file was not rejected? > Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored
[ https://issues.apache.org/jira/browse/CONNECTORS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240989#comment-15240989 ] Konstantin Avdeev commented on CONNECTORS-1295: --- just double checked the value: no special chars as far as I can see: {code:html} Maximum document length: {code} > Windows Share Connector's job: Maximum document length parameter is ignored > --- > > Key: CONNECTORS-1295 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1295 > Project: ManifoldCF > Issue Type: Bug > Components: JCIFS connector >Affects Versions: ManifoldCF 2.3 > Environment: Windows Server 2012 >Reporter: Konstantin Avdeev > > It seems, the windows share jobs ignore the "Maximum document length" > parameter and download documents of any length, e.g.: > Edit job -> Content Length -> Maximum document length: > 50485760 > And from from the history output: > {code} > 04-14-2016 10:52:32.813 access > smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif > code:OK bytes:406334712 time:134843 > {code} > Any ideas, why this huge file was not rejected? > Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored
Konstantin Avdeev created CONNECTORS-1295: - Summary: Windows Share Connector's job: Maximum document length parameter is ignored Key: CONNECTORS-1295 URL: https://issues.apache.org/jira/browse/CONNECTORS-1295 Project: ManifoldCF Issue Type: Bug Components: JCIFS connector Affects Versions: ManifoldCF 2.3 Environment: Windows Server 2012 Reporter: Konstantin Avdeev It seems, the windows share jobs ignore the "Maximum document length" parameter and download documents of any length, e.g.: Edit job -> Content Length -> Maximum document length: 50485760 And from from the history output: {code} 04-14-2016 10:52:32.813 access smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif code:OK bytes:406334712 time:134843 {code} Any ideas, why this huge file was not rejected? Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1286) Solr Plugin: Add support for User Principal
[ https://issues.apache.org/jira/browse/CONNECTORS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224309#comment-15224309 ] Konstantin Avdeev commented on CONNECTORS-1286: --- If the patch gets simplified as follows: {code:java} if (rb.req.getUserPrincipal() != null) { domainMap.put("", rb.req.getUserPrincipal().getName(); } {code} then the solr/jetty login parameter will NOT supercede all of the formal authenticated user parameters/domains passed into the component, but it will be simply added to the {{domainMap}}, if exist. And we would not need a new config parameter like {{AuthDomain}}, since any modifications of the user name (e.g. {{DOMAIN\USER}} -> {{u...@domain.com}}) can be achieved by the MCF mapping. So, users, starting from Solr 5.3, would be able to configure a secure search out of the box then :) What do you think? Thanks! > Solr Plugin: Add support for User Principal > --- > > Key: CONNECTORS-1286 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1286 > Project: ManifoldCF > Issue Type: Improvement > Components: Solr-5.x component >Affects Versions: ManifoldCF 2.3 >Reporter: Konrad Holl >Assignee: Karl Wright >Priority: Minor > Fix For: ManifoldCF 2.4 > > > I’m using ManifoldCF 2.3 with Solr 5.4.1 and the Velocity templating engine. > I needed to do searches with ACLs enabled and installed the plugin. > Unfortunately it is not possible to use the login information provided by > Jetty in the Solr plugin. > As of Solr 5.3 it is possible to extract the authenticated user from the > SolrQueryRequest object: > http://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/request/SolrQueryRequest.html#getUserPrincipal(). > I added these lines to the code in > org.apache.solr.mcf.ManifoldCFSearchComponent before the evaluation of > parameters for authenticated user name: > {code} > String authDomain = (String)args.get("AuthDomain"); > if (rb.req.getUserPrincipal() != null) { > domainMap.put("", rb.req.getUserPrincipal().getName() + > ((authDomain == null) ? "" : "@" + authDomain)); > } > else { > // Get the authenticated user name from the parameters > {code} > I also needed an additional setting “authDomain” in the search component > configuration (solrconfig.xml). Now I can use Velocity even for documents > with ACLs :o) -- This message was sent by Atlassian JIRA (v6.3.4#6332)