[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356893#comment-15356893 ] Phil commented on CONNECTORS-1325: -- Thanks for quick turnaround [~daddywri], I'll give it a go. > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356267#comment-15356267 ] Phil edited comment on CONNECTORS-1325 at 6/30/16 12:43 AM: Hi [~daddywri], I'm finding after installing the patch that it does ignore the error. However, the crawler is continuing to attempt to process this document (or at least the metadata), resulting in the crawler never finishing. Its currently been running for a few days. I tailed the logs for a particular document using the following: {{tail -f manifoldcf.log | grep ""}} Which resulted in the following lines being repeated: {code} DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') sharepoint.SharePointRepository - SharePoint: Finding metadata to include for document/item DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - SharePoint: In getFieldValues; fieldNames= DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Getting version of DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Checking whether to include list item . {code} I've omitted some repository specific details, but let me know if you want any further details. Any idea why this might be happening? Thanks was (Author: priethmuller): Hi [~daddywri], I'm finding after installing the patch that it does ignore the error. However, the crawler is continuing to attempt to process this document (or at least hte metadata), resulting in the crawler never finishing. Its currently being running for a few days. I tailed the logs for a particular document using the following: {{tail -f manifoldcf.log | grep ""}} Which resulted in the following lines being repeated: {code} DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') sharepoint.SharePointRepository - SharePoint: Finding metadata to include for document/item DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - SharePoint: In getFieldValues; fieldNames= DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Getting version of DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Checking whether to include list item . {code} I've omitted some repository specific details, but let me know if you want any further details. Any idea why this might be happening? Thanks > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phil reopened CONNECTORS-1325: -- Hi [~daddywri], I'm finding after installing the patch that it does ignore the error. However, the crawler is continuing to attempt to process this document (or at least hte metadata), resulting in the crawler never finishing. Its currently being running for a few days. I tailed the logs for a particular document using the following: {{tail -f manifoldcf.log | grep ""}} Which resulted in the following lines being repeated: {code} DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') sharepoint.SharePointRepository - SharePoint: Finding metadata to include for document/item DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - SharePoint: In getFieldValues; fieldNames= DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Getting version of DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Checking whether to include list item . {code} I've omitted some repository specific details, but let me know if you want any further details. Any idea why this might be happening? Thanks > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345907#comment-15345907 ] Phil commented on CONNECTORS-1325: -- Without any further evidence, that's fair enough. I think as long as the crawler is able to move past the document, then this would be a good solution. > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345901#comment-15345901 ] Phil commented on CONNECTORS-1325: -- Thanks Karl. If thats how Sharepoint outputs content, then it is an interesting problem! I'm not sure in this instance how many documents it affects or if its related to an issue with the user creating the document. If its a user related issue with creating the content, then logging as an error and continuing the crawl makes sense. If it's not a widespread problem then maybe this is still the best approach in all cases? > Invalid XML character causing job to abort > -- > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector >Affects Versions: ManifoldCF 2.3 >Reporter: Phil >Assignee: Karl Wright >Priority: Blocker > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1325) Invalid XML character causing job to abort
Phil created CONNECTORS-1325: Summary: Invalid XML character causing job to abort Key: CONNECTORS-1325 URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 Project: ManifoldCF Issue Type: Bug Components: SharePoint connector Affects Versions: ManifoldCF 2.3 Reporter: Phil Priority: Blocker The following error is causing the Manifold job to abort, and subsequently the job not being able to finish. It would be good to have the crawler log this error, but not throw an exception which causes the entire job to stop. {code} ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - Exception tossed: XML parsing error: Character reference "" is an invalid XML character. org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: Character reference "" is an invalid XML character. at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; Character reference "" is an invalid XML character. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1291) Sharepoint generating invalid URLs
Phil created CONNECTORS-1291: Summary: Sharepoint generating invalid URLs Key: CONNECTORS-1291 URL: https://issues.apache.org/jira/browse/CONNECTORS-1291 Project: ManifoldCF Issue Type: Bug Components: SharePoint connector Affects Versions: ManifoldCF 2.3 Reporter: Phil The Sharepoint connector currently generates invalid URL's which is then sent on to the output connector. For example the following is an example of a URL that's generated: https://Access Form Administration/DispForm.aspx I'd suggest it should be sent to the output connector as: https://Access%20Form%20Administration/DispForm.aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)