[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-06-30 Thread Phil (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356893#comment-15356893
 ] 

Phil commented on CONNECTORS-1325:
--

Thanks for quick turnaround [~daddywri], I'll give it a go.

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-06-29 Thread Phil (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356267#comment-15356267
 ] 

Phil edited comment on CONNECTORS-1325 at 6/30/16 12:43 AM:


Hi [~daddywri],

I'm finding after installing the patch that it does ignore the error. However, 
the crawler is continuing to attempt to process this document (or at least the 
metadata), resulting in the crawler never finishing. Its currently been running 
for a few days.

I tailed the logs for a particular document using the following:
{{tail -f manifoldcf.log | grep ""}}

Which resulted in the following lines being repeated:
{code}
DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') 
sharepoint.SharePointRepository - SharePoint: Finding metadata to include for 
document/item 
DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - 
SharePoint: In getFieldValues; fieldNames= 
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Getting version of 
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Checking whether to include list 
item 

.

{code}

I've omitted some repository specific details, but let me know if you want any 
further details.

Any idea why this might be happening?

Thanks


was (Author: priethmuller):
Hi [~daddywri],

I'm finding after installing the patch that it does ignore the error. However, 
the crawler is continuing to attempt to process this document (or at least hte 
metadata), resulting in the crawler never finishing. Its currently being 
running for a few days.

I tailed the logs for a particular document using the following:
{{tail -f manifoldcf.log | grep ""}}

Which resulted in the following lines being repeated:
{code}
DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') 
sharepoint.SharePointRepository - SharePoint: Finding metadata to include for 
document/item 
DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - 
SharePoint: In getFieldValues; fieldNames= 
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Getting version of 
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Checking whether to include list 
item 

.

{code}

I've omitted some repository specific details, but let me know if you want any 
further details.

Any idea why this might be happening?

Thanks

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-06-29 Thread Phil (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil reopened CONNECTORS-1325:
--

Hi [~daddywri],

I'm finding after installing the patch that it does ignore the error. However, 
the crawler is continuing to attempt to process this document (or at least hte 
metadata), resulting in the crawler never finishing. Its currently being 
running for a few days.

I tailed the logs for a particular document using the following:
{{tail -f manifoldcf.log | grep ""}}

Which resulted in the following lines being repeated:
{code}
DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') 
sharepoint.SharePointRepository - SharePoint: Finding metadata to include for 
document/item 
DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - 
SharePoint: In getFieldValues; fieldNames= 
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Getting version of 
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Checking whether to include list 
item 

.

{code}

I've omitted some repository specific details, but let me know if you want any 
further details.

Any idea why this might be happening?

Thanks

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-06-23 Thread Phil (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345907#comment-15345907
 ] 

Phil commented on CONNECTORS-1325:
--

Without any further evidence, that's fair enough. I think as long as the 
crawler is able to move past the document, then this would be a good solution.

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-06-23 Thread Phil (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345901#comment-15345901
 ] 

Phil commented on CONNECTORS-1325:
--

Thanks Karl.

If thats how Sharepoint outputs content, then it is an interesting problem! I'm 
not sure in this instance how many documents it affects or if its related to an 
issue with the user creating the document. If its a user related issue with 
creating the content, then logging as an error and continuing the crawl makes 
sense.

If it's not a widespread problem then maybe this is still the best approach in 
all cases?

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-06-22 Thread Phil (JIRA)
Phil created CONNECTORS-1325:


 Summary: Invalid XML character causing job to abort
 Key: CONNECTORS-1325
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
 Project: ManifoldCF
  Issue Type: Bug
  Components: SharePoint connector
Affects Versions: ManifoldCF 2.3
Reporter: Phil
Priority: Blocker


The following error is causing the Manifold job to abort, and subsequently the 
job not being able to finish.

It would be good to have the crawler log this error, but not throw an exception 
which causes the entire job to stop.

{code}
ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
Exception tossed: XML parsing error: Character reference "" is an 
invalid XML character.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
Character reference "" is an invalid XML character.
at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
Character reference "" is an invalid XML character.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
... 4 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1291) Sharepoint generating invalid URLs

2016-03-22 Thread Phil (JIRA)
Phil created CONNECTORS-1291:


 Summary: Sharepoint generating invalid URLs
 Key: CONNECTORS-1291
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1291
 Project: ManifoldCF
  Issue Type: Bug
  Components: SharePoint connector
Affects Versions: ManifoldCF 2.3
Reporter: Phil


The Sharepoint connector currently generates invalid URL's which is then sent 
on to the output connector. 

For example the following is an example of a URL that's generated:
https://Access Form Administration/DispForm.aspx

I'd suggest it should be sent to the output connector as:
https://Access%20Form%20Administration/DispForm.aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)