[jira] [Updated] (CONNECTORS-1298) Housekeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-14 Thread Konstantin Avdeev (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Avdeev updated CONNECTORS-1298:
--
Summary: Housekeeping: jetty webapp temp directories don't get removed upon 
shutdown  (was: Houskeeping: jetty webapp temp directories don't get removed 
upon shutdown)

> Housekeeping: jetty webapp temp directories don't get removed upon shutdown
> ---
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-10 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861643#comment-15861643
 ] 

Konstantin Avdeev commented on CONNECTORS-1298:
---

I dont think, it's a jetty issue, it looks like we having here the case 
described at the bottom of [Temporary 
Directories|http://www.eclipse.org/jetty/documentation/current/ref-temporary-directories.html]

> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-10 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861629#comment-15861629
 ] 

Konstantin Avdeev commented on CONNECTORS-1298:
---

These signals get sent sequentially, so, Ctrl-C should be the only one.
And again, the same issue with {{Ctrl-C}} (aka {{^C}}) when starting from a cmd 
window.

> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-10 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861595#comment-15861595
 ] 

Konstantin Avdeev edited comment on CONNECTORS-1298 at 2/10/17 5:55 PM:


Ctrl-C or, when it is running as a [NSSM|http://nssm.cc/usage#shutdown] service 
, it generates Ctrl-C as well:
!http://nssm.cc/images/install_shutdown.png!

P.S. my timeout is set to 5sec


was (Author: kavdeev):
Ctrl-C or, when it is running as a [NSSM|http://nssm.cc/usage] service , it 
generates Ctrl-C as well:
!http://nssm.cc/images/install_shutdown.png!

P.S. my timeout is set to 5sec

> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-10 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861595#comment-15861595
 ] 

Konstantin Avdeev commented on CONNECTORS-1298:
---

Ctrl-C or, when it is running as a [NSSM|http://nssm.cc/usage] service , it 
generates Ctrl-C as well:
!http://nssm.cc/images/install_shutdown.png!

P.S. my timeout is set to 5sec

> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-10 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861488#comment-15861488
 ] 

Konstantin Avdeev commented on CONNECTORS-1298:
---

Sorry, cant get it... If they get re-used, then why jetty deploys them again 
and again?

> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-10 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861483#comment-15861483
 ] 

Konstantin Avdeev commented on CONNECTORS-1298:
---

Every MCF restart deploys 3 jetty contexts, they are left around in any case 
(clean shutdown or killed):

!mcf-temp-folders.jpg!

> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2017-02-10 Thread Konstantin Avdeev (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Avdeev updated CONNECTORS-1298:
--
Attachment: mcf-temp-folders.jpg

> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Assignee: Furkan KAMACI
>Priority: Minor
> Attachments: mcf-temp-folders.jpg
>
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-13 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571579#comment-15571579
 ] 

Konstantin Avdeev edited comment on CONNECTORS-1325 at 10/13/16 10:55 AM:
--

The company does not use emojis :)
It was just an example how to reproduce the issue.
The original problem was with the "record separator" char, which is used in 
some libraries.

BTW: & #30 - (decimal) - not a valid XML char too: 
http://www.w3schools.com/xml/xml_validator.asp



was (Author: kavdeev):
The company does not use emojis :)
It was just an example how to reproduce the issue.
The original problem was with the "record separator" char, which is used in 
some libraries.

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch, mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-13 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571573#comment-15571573
 ] 

Konstantin Avdeev commented on CONNECTORS-1325:
---

hex format is not a valid XML: http://www.w3schools.com/xml/xml_validator.asp
the decimal format has no issues.
Thanks!

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch, mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-13 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571456#comment-15571456
 ] 

Konstantin Avdeev commented on CONNECTORS-1325:
---

An important update!

I tested the "bad" char again by looking into the network traffic (http wire = 
DEBUG), to make sure what exactly comes from Sharpoint:

and it turned out, that this emoji char gets translated into a "wrong" format 
on MCF side: & # 128512; ---> & # xD83D;& # xDE00;

{code}
DEBUG 2016-10-13 11:39:45,460 (Thread-2572) - http-outgoing-100 << "#' 
ows__ModerationStatus='0' ows__Level='1' ows_Title='Task emoji 
' 
ows_UniqueId='5;#{8F6DF977-9814-4AA0-B7AE-E29838C508CF}' 
ows_owshiddenversion='3' ows_FSObjType='5;#0' ows_PermMask='0x7fff' 
ows_FileRef='5;#sites/test-team/Lists/Main Task List/5_.000' />[\r][\n]"
...
DEBUG 2016-10-13 11:39:45,461 (Worker thread '45') - SharePoint: getListItems 
FileRef value 'sites/test-team/Lists/Main Task List/5_.000', xml response: 
'http://schemas.microsoft.com/sharepoint/soap/;>

   

'
DEBUG 2016-10-13 11:39:45,494 (Worker thread '45') - SharePoint: Can't get 
version of '/Main Task List///5_.000' because of bad XML characters(?)
{code}

and the code & #128512 is a valid XML 1.0 code!

Could you please take a look at the parser?
Thank you!

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch, mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-13 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571237#comment-15571237
 ] 

Konstantin Avdeev edited comment on CONNECTORS-1325 at 10/13/16 8:43 AM:
-

The stackoverflow's thread you mentioned in the second message here, describes 
the problem quite well:
this character encoding was introduces in XML 1.1: 
https://www.w3.org/TR/xml11/#sec-xml11
and a possible solution is: setting the correct header: {code}{code}
I'm afraid, it would take ages to get this fixed by MS.

P.S. the correct XML prologue wont help with emojis, but at least it would 
solve the issue with our "record separator" :)

To be honest, I'm not sure what we could do here, I'm not a fan of workarounds. 
We could leave it as it is now, but could you probably change the "bad 
character" warnings to WARN level? Currently they are shown in DEBUG only, 
which could be misleading in a production environment.
Thanks!


was (Author: kavdeev):
The stackoverflow's thread you mentioned in the second message here, describes 
the problem quite well:
this character encoding was introduces in XML 1.1: 
https://www.w3.org/TR/xml11/#sec-xml11
and the solution is: setting the correct header: {code}{code}
I'm afraid, it would take ages to get this fixed by MS.

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch, mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-12 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569228#comment-15569228
 ] 

Konstantin Avdeev commented on CONNECTORS-1325:
---

ok, the XML response has been attached

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch, mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-12 Thread Konstantin Avdeev (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Avdeev updated CONNECTORS-1325:
--
Attachment: mcf-bad-ms-char.xml

Bad char in the Title field

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch, mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-12 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569216#comment-15569216
 ] 

Konstantin Avdeev commented on CONNECTORS-1325:
---

oops, the confluence parser turned the Title text into a readable form :)
Trying again:

ows_Title="Task emoji >>><<<"

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-12 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569211#comment-15569211
 ] 

Konstantin Avdeev commented on CONNECTORS-1325:
---

hi Karl,

I think, the issue can be reproduced easily, by putting an emoji (e.g. ) into 
a field of a task list:

{code}
DEBUG 2016-10-12 18:32:47,521 (Worker thread '72') - SharePoint: getListItems 
FileRef value 'sites/test-team/Lists/Main Task List/5_.000', xml response: 
'http://schemas.microsoft.com/sharepoint/soap/;>

   

'
DEBUG 2016-10-12 18:32:47,522 (Worker thread '72') - SharePoint: Can't get 
version of '/Main Task List///5_.000' because of bad XML characters(?)
{code}

Thanks!

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-12 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567863#comment-15567863
 ] 

Konstantin Avdeev commented on CONNECTORS-1325:
---

Thank you, Karl! The patch seems to be working - we were able to complete the 
crawl, unfortunately all documents from that particular library contain this 
record separator char, so, there is no content in the index.
We'd need a pre-parsing stage here ;)

P.S. just a note: the complete patch is not yet integrated into v.2.5.

> Invalid XML character causing job to abort
> --
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.3
>Reporter: Phil
>Assignee: Karl Wright
>Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort

2016-10-06 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551697#comment-15551697
 ] 

Konstantin Avdeev commented on CONNECTORS-1325:
---

sure:

{code}
DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Getting 
version of '/DevelopmentDocuments//Test/OP/VM2000248'
DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Checking 
whether to include document '/DevelopmentDocuments/Test/OP/VM2000248'
DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: File 
'/DevelopmentDocuments/Test/OP/VM2000248' exactly matched rule path 
'/DevelopmentDocuments/Test/OP/?'
DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Including file 
'/DevelopmentDocuments/Test/OP/VM2000248'
DEBUG 2016-10-05 14:45:56,511 (Worker thread '21') - SharePoint: Finding 
metadata to include for document/item '/DevelopmentDocuments/Test/OP/VM2000248'.
DEBUG 2016-10-05 14:45:56,515 (Worker thread '21') - SharePoint: In 
getFieldValues; fieldNames=[Ljava.lang.String;@6ddaf458, site='', 
docLibrary='{1C165434-6546-4955-8BF2-05D9632AD202}', 
docId='/DevelopmentDocuments/Test/OP/VM2000248', dspStsWorks=false
DEBUG 2016-10-05 14:45:56,600 (Worker thread '21') - SharePoint: Got a remote 
exception getting field values for site  library 
{1C165434-6546-4955-8BF2-05D9632AD202} document 
[/DevelopmentDocuments/Test/OP/VM2000248] - retrying
AxisFault
 faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
 faultSubcode: 
 faultString: org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 
15716; Character reference "" is an invalid XML character.
 faultActor: 
 faultNode: 
 faultDetail: 
{http://xml.apache.org/axis/}stackTrace:org.xml.sax.SAXParseException; 
lineNumber: 6; columnNumber: 15716; Character reference "" is an invalid 
XML character.
at 
org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown 
Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.scanCharReferenceValue(Unknown 
Source)
at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source)
at 
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
at 
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at 
org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
at 
org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnderstandChecker.java:62)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
at 
com.microsoft.schemas.sharepoint.soap.ListsSoapStub.getListItems(ListsSoapStub.java:1841)
at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2099)
at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1433)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)

{http://xml.apache.org/axis/}hostname:CDE32616

org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 15716; Character 
reference "" is an invalid XML character.
at org.apache.axis.AxisFault.makeFault(AxisFault.java:101)
at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:701)
at 

[jira] [Commented] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix

2016-08-31 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452384#comment-15452384
 ] 

Konstantin Avdeev commented on CONNECTORS-1328:
---

I can confirm, the MCF 2.5 running on Unix works with a SP repository now.
The ticket can be closed, thanks!

> Problem with Sharepoint repositories on Unix
> 
>
> Key: CONNECTORS-1328
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1328
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 2.4
> Environment: Red Hat (Linux), oracle java 1.8.0
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> UI cannot check status of a Sharepoint repository throwing an exception:
> {code}
> [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - 
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /viewconnection.jsp at line 77
> 74:   {
> 75: try
> 76: {
> 77:   connectionStatus = c.check();
> 78: }
> 79: finally
> 80: {
> Stacktrace:
>   at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>   at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412)
>   at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent 
> Code attribute in method that is not native or abstract in class file 
> javax/xml/rpc/ServiceException
>   at 
> org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865)
>   at 
> org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794)
>   at 
> org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)
>   ... 23 more
> Caused by: java.lang.ClassFormatError: Absent Code attribute in method that 
> is not native or abstract in class file javax/xml/rpc/ServiceException
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>   at java.security.AccessController.doPrivileged(Native Method)
>   

[jira] [Commented] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix

2016-08-31 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452305#comment-15452305
 ] 

Konstantin Avdeev commented on CONNECTORS-1328:
---

I haven't tried that. But that could be an option too.
Thanks!

> Problem with Sharepoint repositories on Unix
> 
>
> Key: CONNECTORS-1328
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1328
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 2.4
> Environment: Red Hat (Linux), oracle java 1.8.0
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> UI cannot check status of a Sharepoint repository throwing an exception:
> {code}
> [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - 
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /viewconnection.jsp at line 77
> 74:   {
> 75: try
> 76: {
> 77:   connectionStatus = c.check();
> 78: }
> 79: finally
> 80: {
> Stacktrace:
>   at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>   at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412)
>   at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent 
> Code attribute in method that is not native or abstract in class file 
> javax/xml/rpc/ServiceException
>   at 
> org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865)
>   at 
> org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794)
>   at 
> org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)
>   ... 23 more
> Caused by: java.lang.ClassFormatError: Absent Code attribute in method that 
> is not native or abstract in class file javax/xml/rpc/ServiceException
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 

[jira] [Commented] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix

2016-08-12 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419125#comment-15419125
 ] 

Konstantin Avdeev commented on CONNECTORS-1328:
---

it gets downloaded to {{connector-common-lib/javaee-api-6.0.jar}}, not sure, 
which component it is depend on.
maybe {{3.1.0}}

> Problem with Sharepoint repositories on Unix
> 
>
> Key: CONNECTORS-1328
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1328
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 2.4
> Environment: Red Hat (Linux), oracle java 1.8.0
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> UI cannot check status of a Sharepoint repository throwing an exception:
> {code}
> [qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - 
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /viewconnection.jsp at line 77
> 74:   {
> 75: try
> 76: {
> 77:   connectionStatus = c.check();
> 78: }
> 79: finally
> 80: {
> Stacktrace:
>   at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>   at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412)
>   at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent 
> Code attribute in method that is not native or abstract in class file 
> javax/xml/rpc/ServiceException
>   at 
> org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865)
>   at 
> org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794)
>   at 
> org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)
>   ... 23 more
> Caused by: java.lang.ClassFormatError: Absent Code attribute in method that 
> is not native or abstract in class file javax/xml/rpc/ServiceException
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>   at java.security.AccessController.doPrivileged(Native 

[jira] [Updated] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix

2016-07-29 Thread Konstantin Avdeev (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Avdeev updated CONNECTORS-1328:
--
Description: 
UI cannot check status of a Sharepoint repository throwing an exception:

{code}
[qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - 
org.apache.jasper.JasperException: An exception occurred processing JSP page 
/viewconnection.jsp at line 77

74:   {
75: try
76: {
77:   connectionStatus = c.check();
78: }
79: finally
80: {


Stacktrace:
at 
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412)
at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent 
Code attribute in method that is not native or abstract in class file 
javax/xml/rpc/ServiceException
at 
org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865)
at 
org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794)
at 
org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)
... 23 more
Caused by: java.lang.ClassFormatError: Absent Code attribute in method that is 
not native or abstract in class file javax/xml/rpc/ServiceException
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getSession(SharePointRepository.java:331)
at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.check(SharePointRepository.java:445)
at 
org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:233)
... 26 more
{code}

Upgrading {{javaee-api.jar}} to 7.0 solved the problem
http://central.maven.org/maven2/javax/javaee-api/7.0/

[jira] [Created] (CONNECTORS-1328) Problem with Sharepoint repositories on Unix

2016-07-29 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1328:
-

 Summary: Problem with Sharepoint repositories on Unix
 Key: CONNECTORS-1328
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1328
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework core
Affects Versions: ManifoldCF 2.4
 Environment: Red Hat (Linux), oracle java 1.8.0
Reporter: Konstantin Avdeev


UI cannot check status of a Sharepoint repository throwing an exception:

{code}
[qtp133250414-16] WARN org.eclipse.jetty.servlet.ServletHandler - 
org.apache.jasper.JasperException: An exception occurred processing JSP page 
/viewconnection.jsp at line 77

74:   {
75: try
76: {
77:   connectionStatus = c.check();
78: }
79: finally
80: {


Stacktrace:
at 
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:412)
at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.servlet.ServletException: java.lang.ClassFormatError: Absent 
Code attribute in method that is not native or abstract in class file 
javax/xml/rpc/ServiceException
at 
org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:865)
at 
org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:794)
at 
org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:528)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)
... 23 more
Caused by: java.lang.ClassFormatError: Absent Code attribute in method that is 
not native or abstract in class file javax/xml/rpc/ServiceException
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getSession(SharePointRepository.java:331)
at 

[jira] [Commented] (CONNECTORS-1286) Solr Plugin: Add support for User Principal

2016-07-22 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389174#comment-15389174
 ] 

Konstantin Avdeev commented on CONNECTORS-1286:
---

hi Karl,
yes, I guess, he still would like this feature gets implemented :)
To be honest, this would be a really good integration with Solr - the user 
would be able to authenticate out of the box.
Thank you!


> Solr Plugin: Add support for User Principal
> ---
>
> Key: CONNECTORS-1286
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1286
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr 6.x component
>Affects Versions: ManifoldCF 2.3
>Reporter: Konrad Holl
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.5
>
>
> I’m using ManifoldCF 2.3 with Solr 5.4.1 and the Velocity templating engine. 
> I needed to do searches with ACLs enabled and installed the plugin. 
> Unfortunately it is not possible to use the login information provided by 
> Jetty in the Solr plugin.
> As of Solr 5.3 it is possible to extract the authenticated user from the 
> SolrQueryRequest object: 
> http://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/request/SolrQueryRequest.html#getUserPrincipal().
>  I added these lines to the code in 
> org.apache.solr.mcf.ManifoldCFSearchComponent before the evaluation of 
> parameters for authenticated user name:
> {code}
> String authDomain = (String)args.get("AuthDomain");
> if (rb.req.getUserPrincipal() != null) {
> domainMap.put("", rb.req.getUserPrincipal().getName() + 
> ((authDomain == null) ? "" : "@" + authDomain));
> }
> else {
>   // Get the authenticated user name from the parameters
> {code}
> I also needed an additional setting “authDomain” in the search component 
> configuration (solrconfig.xml). Now I can use Velocity even for documents 
> with ACLs :o)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1324) Not all SharePoint Metadata Fields are returned - 2

2016-06-21 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341971#comment-15341971
 ] 

Konstantin Avdeev commented on CONNECTORS-1324:
---

Thank you very much for the prompt feedback and for the patch!

Just an idea: it would be great to have the up-to-date mapping right in the 
search index, so, the frontend could implement a logic to map the names back, 
e.g.:

{{Name="internal_name" DisplayName="pretty_name"}} would produce the following 
meta-data:
{code:javascript}
"internal_name": "value",
"internal_name_DisplayName": "pretty_name",
{code}

What do you think? Thanks again!

> Not all SharePoint Metadata Fields are returned - 2
> ---
>
> Key: CONNECTORS-1324
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1324
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.4
> Environment: Java 1.8, Windows x64, Sharepoint 2013
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1324.patch
>
>
> Hello Karl,
> This is a follow up ticket for 
> [1284|https://issues.apache.org/jira/browse/CONNECTORS-1284].
> There is still a problem with getting meta-data from sharepoint lists.
> E.g. I'm missing the "title" and "description" fields in the result documents.
> Let's take the "title" field from a list document:
> {code:xml|title=DEBUG: SharePoint: getFieldList xml response:}
> ...
>  Name="Title"
> DisplayName="Task Name" Required="TRUE" 
> SourceID="http://schemas.microsoft.com/sharepoint/v3;
> StaticName="Title" FromBaseType="TRUE" Sealed="TRUE" ColName="nvarchar1" 
> />
> ...
> {code}
> The field {{Name}} is the internal (technical) field name, and 
> {{DisplayName}} is the frontend (user-friendly) name. 
> The connector maps {{Name}} to {{DisplayName}} when it is preparing the 
> request:
> {code:java|title=SharePointRepository.java}
> for (String field : fieldNames.keySet())
> {
>   String value = fieldNames.get(field);
>   fields[j++] = (value==null)?field:value;
> {code}
> Changing the last two lines to:
> {code:java}
>   fields[j++] = field;
> {code}
> solves the problem.
> I doubt, the {{DisplayName}} should be used at all, as it can contain 
> non-ascii chars, e.g.:
> {code:xml}
> ...
>  Name="Title" DisplayName="berschrift" ...
> {code}
> Currently, only fields with the same Name/DisplayName values can be indexed.
> Could you please look into this?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1324) Not all SharePoint Metadata Fields are returned - 2

2016-06-21 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1324:
-

 Summary: Not all SharePoint Metadata Fields are returned - 2
 Key: CONNECTORS-1324
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1324
 Project: ManifoldCF
  Issue Type: Bug
  Components: SharePoint connector
Affects Versions: ManifoldCF 2.4
 Environment: Java 1.8, Windows x64, Sharepoint 2013
Reporter: Konstantin Avdeev


Hello Karl,

This is a follow up ticket for 
[1284|https://issues.apache.org/jira/browse/CONNECTORS-1284].
There is still a problem with getting meta-data from sharepoint lists.

E.g. I'm missing the "title" and "description" fields in the result documents.

Let's take the "title" field from a list document:
{code:xml|title=DEBUG: SharePoint: getFieldList xml response:}
...
http://schemas.microsoft.com/sharepoint/v3;
StaticName="Title" FromBaseType="TRUE" Sealed="TRUE" ColName="nvarchar1" />
...
{code}

The field {{Name}} is the internal (technical) field name, and {{DisplayName}} 
is the frontend (user-friendly) name. 
The connector maps {{Name}} to {{DisplayName}} when it is preparing the request:
{code:java|title=SharePointRepository.java}
for (String field : fieldNames.keySet())
{
  String value = fieldNames.get(field);
  fields[j++] = (value==null)?field:value;
{code}

Changing the last two string to:
{code:java}
  fields[j++] = field;
{code}
solves the problem.
I doubt, the {{DisplayName}} should be used at all, as it can contain non-ascii 
chars, e.g.:
{code:xml}
...


[jira] [Commented] (CONNECTORS-1312) jcifs.smb.SmbException: Connection reset by peer: socket write error

2016-05-07 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275360#comment-15275360
 ] 

Konstantin Avdeev commented on CONNECTORS-1312:
---

yes, we are crawling the server too "hard" - there are a lot of "all pipes are 
busy" warnings, but the file server doesn't seem to be under heavy load.
So, could we add the "connection reset by peer" to the list of non-severe 
exceptions?
Thanks!

> jcifs.smb.SmbException: Connection reset by peer: socket write error
> 
>
> Key: CONNECTORS-1312
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1312
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.5
> Environment: Windows x64, java 1.8.x
>Reporter: Konstantin Avdeev
>
> hi Karl,
> we've found another JCIFS exception: Windows share jobs stop when 
> encountering a "Connection reset by peer" error, e.g.:
> {code}
> ERROR 2016-05-03 15:29:24,209 (Worker thread '80') - JCIFS: SmbException 
> tossed processing smb://server.domain.com/path/file.ppt
> jcifs.smb.SmbException: Connection reset by peer: socket write error
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>   at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>   at jcifs.smb.SmbTransport.doSend(SmbTransport.java:453)
>   at jcifs.util.transport.Transport.sendrecv(Transport.java:67)
>   at jcifs.smb.SmbTransport.send(SmbTransport.java:655)
>   at jcifs.smb.SmbSession.send(SmbSession.java:238)
>   at jcifs.smb.SmbTree.send(SmbTree.java:119)
>   at jcifs.smb.SmbFile.send(SmbFile.java:775)
>   at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
>   at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   at java.io.FilterInputStream.read(FilterInputStream.java:107)
>   at java.nio.file.Files.copy(Files.java:2908)
>   at java.nio.file.Files.copy(Files.java:3027)
>   at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587)
>   at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615)
>   at 
> org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:358)
>   at 
> org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:424)
>   at 
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
>   at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48)
>   at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:227)
>   at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3224)
>   at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3075)
>   at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706)
>   at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>   at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:979)
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> {code}
> Current workaround - to start the job again (manually or by the scheduler).
> It is clear, that there are many errors, when it makes no sense to skip a 
> failed URL and continue the job, e.g.:
> {code}
> Error: SmbAuthException thrown: Logon failure: unknown user name or bad 
> password.
> {code}
> I'm thinking about a general solution, like defining a list (through the UI 
> or properties.xml) with non severe exceptions, like "file busy" or "symlink 
> detected" etc, so the admins would be able to specify, when the crawler 
> should stop and when it should retry, skip and go further.
> What do you 

[jira] [Commented] (CONNECTORS-1311) Dependencies issues

2016-05-04 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271450#comment-15271450
 ] 

Konstantin Avdeev commented on CONNECTORS-1311:
---

Thank you for the quick fix for the first issue!
Would you mind commenting on the others?..

> Dependencies issues
> ---
>
> Key: CONNECTORS-1311
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1311
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Build
>Affects Versions: ManifoldCF 2.5
> Environment: any
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> There are several issues with the dependencies:
> 1) POI should be 3.13, since tika 1.12 uses that version. With POI 3.14 tika 
> cannot parse presentation files (ppt):
> {code}
> FATAL 2016-05-03 10:39:16,821 (Worker thread '0') - Error tossed: 
> org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder;
> java.lang.NoSuchMethodError: 
> org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder;
>   at 
> org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.extractContent(XSLFPowerPointExtractorDecorator.java:154)
>   at 
> org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:88)
>   at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
>   at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48)
> {code}
> 2) jcifs "1.3.17" is used currently. Available is "1.3.18".
> 3) Java Advanced Imaging (JAI), jbig2 format libs are not included, but 
> required for parsing embedded images.
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1312) jcifs.smb.SmbException: Connection reset by peer: socket write error

2016-05-04 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1312:
-

 Summary: jcifs.smb.SmbException: Connection reset by peer: socket 
write error
 Key: CONNECTORS-1312
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1312
 Project: ManifoldCF
  Issue Type: Bug
  Components: JCIFS connector
Affects Versions: ManifoldCF 2.5
 Environment: Windows x64, java 1.8.x
Reporter: Konstantin Avdeev


hi Karl,

we've found another JCIFS exception: Windows share jobs stop when encountering 
a "Connection reset by peer" error, e.g.:
{code}
ERROR 2016-05-03 15:29:24,209 (Worker thread '80') - JCIFS: SmbException tossed 
processing smb://server.domain.com/path/file.ppt
jcifs.smb.SmbException: Connection reset by peer: socket write error
java.net.SocketException: Connection reset by peer: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at jcifs.smb.SmbTransport.doSend(SmbTransport.java:453)
at jcifs.util.transport.Transport.sendrecv(Transport.java:67)
at jcifs.smb.SmbTransport.send(SmbTransport.java:655)
at jcifs.smb.SmbSession.send(SmbSession.java:238)
at jcifs.smb.SmbTree.send(SmbTree.java:119)
at jcifs.smb.SmbFile.send(SmbFile.java:775)
at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at java.nio.file.Files.copy(Files.java:2908)
at java.nio.file.Files.copy(Files.java:3027)
at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587)
at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615)
at 
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:358)
at 
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:424)
at 
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
at 
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48)
at 
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:227)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3224)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3075)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:979)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
{code}

Current workaround - to start the job again (manually or by the scheduler).

It is clear, that there are many errors, when it makes no sense to skip a 
failed URL and continue the job, e.g.:
{code}
Error: SmbAuthException thrown: Logon failure: unknown user name or bad 
password.
{code}

I'm thinking about a general solution, like defining a list (through the UI or 
properties.xml) with non severe exceptions, like "file busy" or "symlink 
detected" etc, so the admins would be able to specify, when the crawler should 
stop and when it should retry, skip and go further.

What do you think?
Thank you!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1311) Dependencies issues

2016-05-04 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1311:
-

 Summary: Dependencies issues
 Key: CONNECTORS-1311
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1311
 Project: ManifoldCF
  Issue Type: Bug
  Components: Build
Affects Versions: ManifoldCF 2.5
 Environment: any
Reporter: Konstantin Avdeev


There are several issues with the dependencies:

1) POI should be 3.13, since tika 1.12 uses that version. With POI 3.14 tika 
cannot parse presentation files (ppt):
{code}
FATAL 2016-05-03 10:39:16,821 (Worker thread '0') - Error tossed: 
org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder;
java.lang.NoSuchMethodError: 
org.apache.poi.xslf.usermodel.XSLFTextShape.getTextType()Lorg/apache/poi/xslf/usermodel/Placeholder;
at 
org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.extractContent(XSLFPowerPointExtractorDecorator.java:154)
at 
org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:88)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48)
{code}

2) jcifs "1.3.17" is used currently. Available is "1.3.18".

3) Java Advanced Imaging (JAI), jbig2 format libs are not included, but 
required for parsing embedded images.

Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205

2016-04-30 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265421#comment-15265421
 ] 

Konstantin Avdeev commented on CONNECTORS-1305:
---

Seems to be working!
before:
{code}
ERROR 2016-04-30 20:35:17,664 (Worker thread '11') - JCIFS: SmbException tossed 
processing 
smb://localhost/share/longDir-0/longDir-1/longDir-2/longDir-3/longDir-4/longDir-5/longDir-6/longDir-7/longDir-8/longDir-9/longDir-10/longDir-11/longDir-12/longDir-13/longDir-14/longDir-15/longDir-16/longDir-17/longDir-18/longDir-19/longDir-20/longDir-21/longDir-22/longDir-23/longDir-24/longDir-25/longDir-26/longDir-27/longDir-28/longDir-29/longDir-30/longDir-31/longDir-32/longDir-33/longDir-34/longDir-35/longDir-36/longDir-37/longDir-38/longDir-39/longDir-40/longDir-41/longDir-42/longDir-43/longDir-44/longDir-45/
jcifs.smb.SmbException: 0xC205
 INFO 2016-04-30 20:35:17,671 (Worker thread '11') - Aborting job 1460647853267 
due to error 'SmbException tossed: 0xC205'
{code}
after:
{code}
 WARN 2016-04-30 20:38:45,769 (Worker thread '86') - JCIFS: Out of resources 
exception reading document/directory 
smb://localhost/share/longDir-0/longDir-1/longDir-2/longDir-3/longDir-4/longDir-5/longDir-6/longDir-7/longDir-8/longDir-9/longDir-10/longDir-11/longDir-12/longDir-13/longDir-14/longDir-15/longDir-16/longDir-17/longDir-18/longDir-19/longDir-20/longDir-21/longDir-22/longDir-23/longDir-24/longDir-25/longDir-26/longDir-27/longDir-28/longDir-29/longDir-30/longDir-31/longDir-32/longDir-33/longDir-34/longDir-35/longDir-36/longDir-37/longDir-38/longDir-39/longDir-40/longDir-41/longDir-42/longDir-43/longDir-44/longDir-45/
 - skipping
{code}

Thank you for the patch!

> Windows Share connector: SmbException tossed: 0xC205
> 
>
> Key: CONNECTORS-1305
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1305
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.4
> Environment: Windows server 2012
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1305.patch
>
>
> Windows share jobs stop when encountering an [Insufficient server resources 
> exist to complete the 
> request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply 
> (0xC205 - STATUS_INSUFF_SERVER_RESOURCES).
> Is it possible to catch that exception as well?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1307) Tika extractor infinite loop on error

2016-04-29 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264108#comment-15264108
 ] 

Konstantin Avdeev commented on CONNECTORS-1307:
---

Tika 1.12 solved the StackOverflowError. But, as you say, it has a lot of 
dependencies, and they have changed!
So, I have to figure out, which libs needs to be updated as well...
Thank you for you help as usual!


> Tika extractor infinite loop on error
> -
>
> Key: CONNECTORS-1307
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1307
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika extractor
>Affects Versions: ManifoldCF 2.4
> Environment: windows 64bit, java version "1.8.0_77", 
> pdfbox-1.8.10.jar, tika-parsers-1.10.jar
>Reporter: Konstantin Avdeev
>
> The Tika extractor gets stuck (is trying to parse the same document again and 
> again) on the following error:
> {code}
> FATAL 2016-04-29 10:55:45,505 (Worker thread '41') - Error tossed: null
> java.lang.StackOverflowError
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:296)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:348)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
> {code}
> -Xss - is the default one, which is, I believe, 512k.
> We can increase the stack trace size, but I think, this error should not lead 
> to such situation.
> Thanks a lot!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205

2016-04-29 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263950#comment-15263950
 ] 

Konstantin Avdeev commented on CONNECTORS-1305:
---

It seems to be a very long path, created by a deployment script or something 
like that. I can't even navigate deeper using the windows file explorer - it 
just gets stuck.
When the server replies with the status "insufficient resource", it'd make 
sense to skip that URL and re-check it next time (of course, it will never work 
for that particular case), but not to cancel the whole job. That's only my 
opinion.
Thanks!

> Windows Share connector: SmbException tossed: 0xC205
> 
>
> Key: CONNECTORS-1305
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1305
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.4
> Environment: Windows server 2012
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> Windows share jobs stop when encountering an [Insufficient server resources 
> exist to complete the 
> request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply 
> (0xC205 - STATUS_INSUFF_SERVER_RESOURCES).
> Is it possible to catch that exception as well?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1307) Tika extractor infinite loop on error

2016-04-29 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263938#comment-15263938
 ] 

Konstantin Avdeev commented on CONNECTORS-1307:
---

Fair enough :)
is it a worth of trying to replace the tika and pdfbox libraries by the latest 
ones?
Thanks!

> Tika extractor infinite loop on error
> -
>
> Key: CONNECTORS-1307
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1307
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika extractor
>Affects Versions: ManifoldCF 2.4
> Environment: windows 64bit, java version "1.8.0_77", 
> pdfbox-1.8.10.jar, tika-parsers-1.10.jar
>Reporter: Konstantin Avdeev
>
> The Tika extractor gets stuck (is trying to parse the same document again and 
> again) on the following error:
> {code}
> FATAL 2016-04-29 10:55:45,505 (Worker thread '41') - Error tossed: null
> java.lang.StackOverflowError
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:296)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:348)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
>   at 
> org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
> {code}
> -Xss - is the default one, which is, I believe, 512k.
> We can increase the stack trace size, but I think, this error should not lead 
> to such situation.
> Thanks a lot!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1307) Tika extractor infinite loop on error

2016-04-29 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1307:
-

 Summary: Tika extractor infinite loop on error
 Key: CONNECTORS-1307
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1307
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.4
 Environment: windows 64bit, java version "1.8.0_77", 
pdfbox-1.8.10.jar, tika-parsers-1.10.jar
Reporter: Konstantin Avdeev


The Tika extractor gets stuck (is trying to parse the same document again and 
again) on the following error:
{code}
FATAL 2016-04-29 10:55:45,505 (Worker thread '41') - Error tossed: null
java.lang.StackOverflowError
at 
org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at 
org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250)
at 
org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at 
org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at 
org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at 
org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
at 
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
at 
org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:296)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:348)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
{code}

-Xss - is the default one, which is, I believe, 512k.
We can increase the stack trace size, but I think, this error should not lead 
to such situation.
Thanks a lot!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205

2016-04-28 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262601#comment-15262601
 ] 

Konstantin Avdeev commented on CONNECTORS-1305:
---

FYI: I don't use DFS paths, the connector crawls the servers directly.

> Windows Share connector: SmbException tossed: 0xC205
> 
>
> Key: CONNECTORS-1305
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1305
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.4
> Environment: Windows server 2012
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> Windows share jobs stop when encountering an [Insufficient server resources 
> exist to complete the 
> request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply 
> (0xC205 - STATUS_INSUFF_SERVER_RESOURCES).
> Is it possible to catch that exception as well?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205

2016-04-28 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262516#comment-15262516
 ] 

Konstantin Avdeev edited comment on CONNECTORS-1305 at 4/28/16 4:59 PM:


I think, it's a permanent one. Fortunately, I've got the full stack trace for 
that: it turned out, that the problem is a very long path:
{code}
ERROR 2016-04-28 17:34:43,332 (Worker thread '89') - JCIFS: SmbException tossed 
processing 
smb://server.domain.com/Share/dir1/dir2/dir3/dir4/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/.metadata/.plugins/com.collabnet.subversion.merge/
jcifs.smb.SmbException: 0xC205
{code}
The connector tried to get there four times, permanently getting the error 
message from the server, and canceled the job.
Basically, it is not able to complete the job because of the weird path. So, 
such kind of exception should be a WARN not an ERROR.

Thanks!

P.S. I've added _dir4/repeating-dir_ to the exclude list, hopefully, it'd skip 
that bad dir on the next run.


was (Author: kavdeev):
I think, it's a permanent one. Fortunately, I've got the full stack trace for 
that: it turned out, that the problem is a very long path:

ERROR 2016-04-28 17:34:43,332 (Worker thread '89') - JCIFS: SmbException tossed 
processing 
smb://server.domain.com/Share/dir1/dir2/dir3/dir4/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/.metadata/.plugins/com.collabnet.subversion.merge/
jcifs.smb.SmbException: 0xC205

The connector tried to get there four times, permanently getting the error 
message from the server, and canceled the job.
Basically, it is not able to complete the job because of the weird path. So, 
such kind of exception should be a WARN not an ERROR.

Thanks!


> Windows Share connector: SmbException tossed: 0xC205
> 
>
> Key: CONNECTORS-1305
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1305
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.4
> Environment: Windows server 2012
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> Windows share jobs stop when encountering an [Insufficient server resources 
> exist to complete the 
> request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply 
> (0xC205 - STATUS_INSUFF_SERVER_RESOURCES).
> Is it possible to catch that exception as well?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205

2016-04-28 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262516#comment-15262516
 ] 

Konstantin Avdeev commented on CONNECTORS-1305:
---

I think, it's a permanent one. Fortunately, I've got the full stack trace for 
that: it turned out, that the problem is a very long path:

ERROR 2016-04-28 17:34:43,332 (Worker thread '89') - JCIFS: SmbException tossed 
processing 
smb://server.domain.com/Share/dir1/dir2/dir3/dir4/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/repeating-dir/.metadata/.plugins/com.collabnet.subversion.merge/
jcifs.smb.SmbException: 0xC205

The connector tried to get there four times, permanently getting the error 
message from the server, and canceled the job.
Basically, it is not able to complete the job because of the weird path. So, 
such kind of exception should be a WARN not an ERROR.

Thanks!


> Windows Share connector: SmbException tossed: 0xC205
> 
>
> Key: CONNECTORS-1305
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1305
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.4
> Environment: Windows server 2012
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.5
>
>
> Windows share jobs stop when encountering an [Insufficient server resources 
> exist to complete the 
> request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply 
> (0xC205 - STATUS_INSUFF_SERVER_RESOURCES).
> Is it possible to catch that exception as well?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1305) Windows Share connector: SmbException tossed: 0xC0000205

2016-04-28 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1305:
-

 Summary: Windows Share connector: SmbException tossed: 0xC205
 Key: CONNECTORS-1305
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1305
 Project: ManifoldCF
  Issue Type: Bug
  Components: JCIFS connector
Affects Versions: ManifoldCF 2.4
 Environment: Windows server 2012
Reporter: Konstantin Avdeev


Windows share jobs stop when encountering an [Insufficient server resources 
exist to complete the 
request|https://msdn.microsoft.com/en-us/library/cc704588.aspx] server reply 
(0xC205 - STATUS_INSUFF_SERVER_RESOURCES).
Is it possible to catch that exception as well?
Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?

2016-04-17 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244745#comment-15244745
 ] 

Konstantin Avdeev commented on CONNECTORS-1299:
---

BTW - the current trunk version behaves exactly in the same way.

> "Seeding" phase of a job prevents starting others?
> --
>
> Key: CONNECTORS-1299
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1299
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
> Environment: Windows
>Reporter: Konstantin Avdeev
>
> Hello Karl, could you please clarify if this is a bug or a feature? :)
> When I start an smb job for a share containing a lot of files (can be 
> reproduced with a \Windows directory :)) and then start a second job, the 
> last one remains some time (depends on amount of data processing by the first 
> one) with the status "running", but showing {{"Active=1"}} and does not 
> progress.
> Setting log level to Debug did not shed a light on this, unfortunately.
> It would be great, if could elaborate on that a little!
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?

2016-04-17 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244744#comment-15244744
 ] 

Konstantin Avdeev commented on CONNECTORS-1299:
---

Thanks a lot for the explanation!
Just curious - is it also described in the book: 
https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/ ?
If yes, I'm going to read this :)

--
best regards,
KA

> "Seeding" phase of a job prevents starting others?
> --
>
> Key: CONNECTORS-1299
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1299
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
> Environment: Windows
>Reporter: Konstantin Avdeev
>
> Hello Karl, could you please clarify if this is a bug or a feature? :)
> When I start an smb job for a share containing a lot of files (can be 
> reproduced with a \Windows directory :)) and then start a second job, the 
> last one remains some time (depends on amount of data processing by the first 
> one) with the status "running", but showing {{"Active=1"}} and does not 
> progress.
> Setting log level to Debug did not shed a light on this, unfortunately.
> It would be great, if could elaborate on that a little!
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1297) Windows Share job stops upon a symbolic link

2016-04-17 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244743#comment-15244743
 ] 

Konstantin Avdeev commented on CONNECTORS-1297:
---

Great support again, Thank you very much!
{code}
 INFO 2016-04-17 19:11:58,344 (Startup thread) - Marked job 1460829266713 for 
startup
 INFO 2016-04-17 19:12:15,642 (Startup thread) - Job 1460829266713 is now 
started
 WARN 2016-04-17 19:15:19,709 (Worker thread '74') - JCIFS: Symlink detected: 
smb://localhost/C$/Program Files/Java/jdk/
DEBUG 2016-04-17 19:15:42,943 (Idle cleanup thread) - Connection manager is 
shutting down
DEBUG 2016-04-17 19:15:42,943 (Idle cleanup thread) - http-outgoing-1: Close 
connection
DEBUG 2016-04-17 19:15:42,943 (Idle cleanup thread) - Connection manager shut 
down
{code}

> Windows Share job stops upon a symbolic link
> 
>
> Key: CONNECTORS-1297
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1297
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.3
> Environment: Windows 2012, Windows 10
>Reporter: Konstantin Avdeev
> Attachments: CONNECTORS-1297.patch
>
>
> Windows shares having a symbolic link cannot be crawled: the job stop with 
> the exception:
> {code}
> Error: SmbException tossed: 0x802D
> {code}
> Stack trace example:
> {code}
> ERROR 2016-04-16 20:01:40,384 (Worker thread '76') - JCIFS: SmbException 
> tossed processing smb://localhost/C$/Program Files/Java/jdk/
> jcifs.smb.SmbException: 0x802D
>   at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
>   at jcifs.smb.SmbTransport.send(SmbTransport.java:640)
>   at jcifs.smb.SmbSession.send(SmbSession.java:238)
>   at jcifs.smb.SmbTree.send(SmbTree.java:119)
>   at jcifs.smb.SmbFile.send(SmbFile.java:775)
>   at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989)
>   at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741)
>   at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718)
>   at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707)
>   at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2295)
>   at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:788)
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> {code}
> jdk is a symbolic directory link
> {code}
> 02.03.2016  12:38 jdk [jdk1.8.0]
> {code}
> Expected behaviour: treat a link as an usual directory/file. Or at least, 
> skip it and continue the job.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1297) Windows Share job stops upon a symbolic link

2016-04-16 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244536#comment-15244536
 ] 

Konstantin Avdeev commented on CONNECTORS-1297:
---

ok, make sense, but in the meantime, couldn't we just catch that particular 
exception in order to process the job further?
Thanks! 

> Windows Share job stops upon a symbolic link
> 
>
> Key: CONNECTORS-1297
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1297
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.3
> Environment: Windows 2012, Windows 10
>Reporter: Konstantin Avdeev
>
> Windows shares having a symbolic link cannot be crawled: the job stop with 
> the exception:
> {code}
> Error: SmbException tossed: 0x802D
> {code}
> Stack trace example:
> {code}
> ERROR 2016-04-16 20:01:40,384 (Worker thread '76') - JCIFS: SmbException 
> tossed processing smb://localhost/C$/Program Files/Java/jdk/
> jcifs.smb.SmbException: 0x802D
>   at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
>   at jcifs.smb.SmbTransport.send(SmbTransport.java:640)
>   at jcifs.smb.SmbSession.send(SmbSession.java:238)
>   at jcifs.smb.SmbTree.send(SmbTree.java:119)
>   at jcifs.smb.SmbFile.send(SmbFile.java:775)
>   at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989)
>   at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741)
>   at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718)
>   at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707)
>   at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2295)
>   at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:788)
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> {code}
> jdk is a symbolic directory link
> {code}
> 02.03.2016  12:38 jdk [jdk1.8.0]
> {code}
> Expected behaviour: treat a link as an usual directory/file. Or at least, 
> skip it and continue the job.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2016-04-16 Thread Konstantin Avdeev (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Avdeev updated CONNECTORS-1298:
--
Description: 
Every MCF restart leaves out webapp temp dirs like this:

{code}
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
{code}

(or under {{java.io.tmpdir}}, if set)

Expected behaviour: delete these dir upon exit.
Could it help to set jetty's {{persistTempDirectory}} to false for these 
contextes?

Thank you!

  was:
Every MCF restart leaves out webapp temp dirs like this:

{code}
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
{code}

(or under {{java.io.tmpdir}}, if set)

Expected behaviour: delete these dir upon exit.
Could it help to set jetty's {{persistTempDirectory}} to true for these 
contextes?

Thank you!


> Houskeeping: jetty webapp temp directories don't get removed upon shutdown
> --
>
> Key: CONNECTORS-1298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
> Environment: Windows
>Reporter: Konstantin Avdeev
>Priority: Minor
>
> Every MCF restart leaves out webapp temp dirs like this:
> {code}
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
> C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
> {code}
> (or under {{java.io.tmpdir}}, if set)
> Expected behaviour: delete these dir upon exit.
> Could it help to set jetty's {{persistTempDirectory}} to false for these 
> contextes?
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?

2016-04-16 Thread Konstantin Avdeev (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Avdeev updated CONNECTORS-1299:
--
Description: 
Hello Karl, could you please clarify if this is a bug or a feature? :)
When I start an smb job for a share containing a lot of files (can be 
reproduced with a \Windows directory :)) and then start a second job, the last 
one remains some time (depends on amount of data processing by the first one) 
with the status "running", but showing {{"Active=1"}} and does not progress.

Setting log level to Debug did not shed a light on this, unfortunately.

It would be great, if could elaborate on that a little!
Thank you!

  was:
Hello Karl, could you please clarify if this is a bug or a feature? :)
When I start an smb job for a share containing a lot of files (can be 
reproduced with a \Windows directory :)) and then start a second job, the last 
one remains some time (depends on amount of data processing by the first one) 
with the status "running", but showing {{"Active=1"}} and does not progress.

Setting Debug=true did not shed a light on this, unfortunately.

It would be great, if could elaborate on that a little!
Thank you!


> "Seeding" phase of a job prevents starting others?
> --
>
> Key: CONNECTORS-1299
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1299
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
> Environment: Windows
>Reporter: Konstantin Avdeev
>
> Hello Karl, could you please clarify if this is a bug or a feature? :)
> When I start an smb job for a share containing a lot of files (can be 
> reproduced with a \Windows directory :)) and then start a second job, the 
> last one remains some time (depends on amount of data processing by the first 
> one) with the status "running", but showing {{"Active=1"}} and does not 
> progress.
> Setting log level to Debug did not shed a light on this, unfortunately.
> It would be great, if could elaborate on that a little!
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?

2016-04-16 Thread Konstantin Avdeev (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Avdeev updated CONNECTORS-1299:
--
Summary: "Seeding" phase of a job prevents starting others?  (was: 
"Seeding" phase of a job prevent starting others?)

> "Seeding" phase of a job prevents starting others?
> --
>
> Key: CONNECTORS-1299
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1299
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
> Environment: Windows
>Reporter: Konstantin Avdeev
>
> Hello Karl, could you please clarify if this is a bug or a feature? :)
> When I start an smb job for a share containing a lot of files (can be 
> reproduced with a \Windows directory :)) and then start a second job, the 
> last one remains some time (depends on amount of data processing by the first 
> one) with the status "running", but showing {{"Active=1"}} and does not 
> progress.
> Setting Debug=true did not shed a light on this, unfortunately.
> It would be great, if could elaborate on that a little!
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1299) "Seeding" phase of a job prevent starting others?

2016-04-16 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1299:
-

 Summary: "Seeding" phase of a job prevent starting others?
 Key: CONNECTORS-1299
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1299
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
 Environment: Windows
Reporter: Konstantin Avdeev


Hello Karl, could you please clarify if this is a bug or a feature? :)
When I start an smb job for a share containing a lot of files (can be 
reproduced with a \Windows directory :)) and then start a second job, the last 
one remains some time (depends on amount of data processing by the first one) 
with the status "running", but showing {{"Active=1"}} and does not progress.

Setting Debug=true did not shed a light on this, unfortunately.

It would be great, if could elaborate on that a little!
Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1298) Houskeeping: jetty webapp temp directories don't get removed upon shutdown

2016-04-16 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1298:
-

 Summary: Houskeeping: jetty webapp temp directories don't get 
removed upon shutdown
 Key: CONNECTORS-1298
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1298
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework core
 Environment: Windows
Reporter: Konstantin Avdeev
Priority: Minor


Every MCF restart leaves out webapp temp dirs like this:

{code}
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-crawler-ui.war-_mcf-crawler-ui-any-125313306681249528.dir/webapp/
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-6028962368901452542.dir/webapp/
C:/Windows/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-4370925025384089553.dir/webapp/
{code}

(or under {{java.io.tmpdir}}, if set)

Expected behaviour: delete these dir upon exit.
Could it help to set jetty's {{persistTempDirectory}} to true for these 
contextes?

Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1297) Windows Share job stops upon a symbolic link

2016-04-16 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1297:
-

 Summary: Windows Share job stops upon a symbolic link
 Key: CONNECTORS-1297
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1297
 Project: ManifoldCF
  Issue Type: Bug
  Components: JCIFS connector
Affects Versions: ManifoldCF 2.3
 Environment: Windows 2012, Windows 10
Reporter: Konstantin Avdeev


Windows shares having a symbolic link cannot be crawled: the job stop with the 
exception:

{code}
Error: SmbException tossed: 0x802D
{code}

Stack trace example:
{code}
ERROR 2016-04-16 20:01:40,384 (Worker thread '76') - JCIFS: SmbException tossed 
processing smb://localhost/C$/Program Files/Java/jdk/
jcifs.smb.SmbException: 0x802D
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
at jcifs.smb.SmbTransport.send(SmbTransport.java:640)
at jcifs.smb.SmbSession.send(SmbSession.java:238)
at jcifs.smb.SmbTree.send(SmbTree.java:119)
at jcifs.smb.SmbFile.send(SmbFile.java:775)
at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989)
at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741)
at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718)
at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707)
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2295)
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:788)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
{code}

jdk is a symbolic directory link
{code}
02.03.2016  12:38 jdk [jdk1.8.0]
{code}

Expected behaviour: treat a link as an usual directory/file. Or at least, skip 
it and continue the job.

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored

2016-04-14 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241022#comment-15241022
 ] 

Konstantin Avdeev commented on CONNECTORS-1295:
---

Great support! Thank you! :)

> Windows Share Connector's job: Maximum document length parameter is ignored
> ---
>
> Key: CONNECTORS-1295
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1295
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.3
> Environment: Windows Server 2012
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.4
>
> Attachments: CONNECTORS-1295.patch
>
>
> It seems, the windows share jobs ignore the "Maximum document length" 
> parameter and download documents of any length, e.g.:
> Edit job -> Content Length -> Maximum document length:
> 50485760
> And from from the history output:
> {code}
> 04-14-2016 10:52:32.813 access 
> smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif
> code:OK bytes:406334712 time:134843 
> {code}
> Any ideas, why this huge file was not rejected?
> Thanks a lot!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored

2016-04-14 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241008#comment-15241008
 ] 

Konstantin Avdeev commented on CONNECTORS-1295:
---

one more double check from the "jobs" table: {{select description,docspec from 
jobs}}
{code:xml}

{code}

> Windows Share Connector's job: Maximum document length parameter is ignored
> ---
>
> Key: CONNECTORS-1295
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1295
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.3
> Environment: Windows Server 2012
>Reporter: Konstantin Avdeev
>Assignee: Karl Wright
> Fix For: ManifoldCF 2.4
>
> Attachments: CONNECTORS-1295.patch
>
>
> It seems, the windows share jobs ignore the "Maximum document length" 
> parameter and download documents of any length, e.g.:
> Edit job -> Content Length -> Maximum document length:
> 50485760
> And from from the history output:
> {code}
> 04-14-2016 10:52:32.813 access 
> smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif
> code:OK bytes:406334712 time:134843 
> {code}
> Any ideas, why this huge file was not rejected?
> Thanks a lot!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored

2016-04-14 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240989#comment-15240989
 ] 

Konstantin Avdeev commented on CONNECTORS-1295:
---

just double checked the value: no special chars as far as I can see:
{code:html}
Maximum document length:

{code}


> Windows Share Connector's job: Maximum document length parameter is ignored
> ---
>
> Key: CONNECTORS-1295
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1295
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.3
> Environment: Windows Server 2012
>Reporter: Konstantin Avdeev
>
> It seems, the windows share jobs ignore the "Maximum document length" 
> parameter and download documents of any length, e.g.:
> Edit job -> Content Length -> Maximum document length:
> 50485760
> And from from the history output:
> {code}
> 04-14-2016 10:52:32.813 access 
> smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif
> code:OK bytes:406334712 time:134843 
> {code}
> Any ideas, why this huge file was not rejected?
> Thanks a lot!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1295) Windows Share Connector's job: Maximum document length parameter is ignored

2016-04-14 Thread Konstantin Avdeev (JIRA)
Konstantin Avdeev created CONNECTORS-1295:
-

 Summary: Windows Share Connector's job: Maximum document length 
parameter is ignored
 Key: CONNECTORS-1295
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1295
 Project: ManifoldCF
  Issue Type: Bug
  Components: JCIFS connector
Affects Versions: ManifoldCF 2.3
 Environment: Windows Server 2012
Reporter: Konstantin Avdeev


It seems, the windows share jobs ignore the "Maximum document length" parameter 
and download documents of any length, e.g.:

Edit job -> Content Length -> Maximum document length:
50485760

And from from the history output:
{code}
04-14-2016 10:52:32.813 access 
smb://server.domain.com/share/dir1/dir2/dirXXX.../file.tif
code:OK bytes:406334712 time:134843 
{code}

Any ideas, why this huge file was not rejected?
Thanks a lot!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1286) Solr Plugin: Add support for User Principal

2016-04-04 Thread Konstantin Avdeev (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224309#comment-15224309
 ] 

Konstantin Avdeev commented on CONNECTORS-1286:
---

If the patch gets simplified as follows:
{code:java}
if (rb.req.getUserPrincipal() != null) {
domainMap.put("", rb.req.getUserPrincipal().getName();
}
{code}
then the solr/jetty login parameter will NOT supercede all of the formal 
authenticated user parameters/domains passed into the component, but it will be 
simply added to the {{domainMap}}, if exist. And we would not need a new config 
parameter like {{AuthDomain}}, since any modifications of the user name (e.g. 
{{DOMAIN\USER}} -> {{u...@domain.com}}) can be achieved by the MCF mapping.

So, users, starting from Solr 5.3, would be able to configure a secure search 
out of the box then :)
What do you think? Thanks!

> Solr Plugin: Add support for User Principal
> ---
>
> Key: CONNECTORS-1286
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1286
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr-5.x component
>Affects Versions: ManifoldCF 2.3
>Reporter: Konrad Holl
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.4
>
>
> I’m using ManifoldCF 2.3 with Solr 5.4.1 and the Velocity templating engine. 
> I needed to do searches with ACLs enabled and installed the plugin. 
> Unfortunately it is not possible to use the login information provided by 
> Jetty in the Solr plugin.
> As of Solr 5.3 it is possible to extract the authenticated user from the 
> SolrQueryRequest object: 
> http://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/request/SolrQueryRequest.html#getUserPrincipal().
>  I added these lines to the code in 
> org.apache.solr.mcf.ManifoldCFSearchComponent before the evaluation of 
> parameters for authenticated user name:
> {code}
> String authDomain = (String)args.get("AuthDomain");
> if (rb.req.getUserPrincipal() != null) {
> domainMap.put("", rb.req.getUserPrincipal().getName() + 
> ((authDomain == null) ? "" : "@" + authDomain));
> }
> else {
>   // Get the authenticated user name from the parameters
> {code}
> I also needed an additional setting “authDomain” in the search component 
> configuration (solrconfig.xml). Now I can use Velocity even for documents 
> with ACLs :o)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)