[jira] [Commented] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356993#comment-16356993
 ] 

Karl Wright commented on CONNECTORS-1494:
-

No idea why Linux is not allowing us to find it.  We simply use java.io classes 
to enumerate and access files.

Can  you open a bug with Oracle?


> Error crawling file system with file names having special characters.
> -
>
> Key: CONNECTORS-1494
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Vinay
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 2.10
>
>
> I am crawling a file system mounted on linux machine. So the Repository 
> Connection is of type "File System". For some files which has some special 
> characters, Manifold Cf is not picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:1.8.0_151]
>      at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
>      at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
>  ~[mcf-agents.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
>  ~[mcf-pull-agent.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
> [mcf-pull-agent.jar:?]
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
> string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1494.
-
   Resolution: Won't Fix
Fix Version/s: ManifoldCF 2.10

Not our problem.  Oracle ticket recommended.

> Error crawling file system with file names having special characters.
> -
>
> Key: CONNECTORS-1494
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Vinay
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 2.10
>
>
> I am crawling a file system mounted on linux machine. So the Repository 
> Connection is of type "File System". For some files which has some special 
> characters, Manifold Cf is not picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:1.8.0_151]
>      at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
>      at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
>  ~[mcf-agents.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
>  ~[mcf-pull-agent.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
> [mcf-pull-agent.jar:?]
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
> string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated CONNECTORS-1494:
--
Priority: Critical  (was: Major)

> Error crawling file system with file names having special characters.
> -
>
> Key: CONNECTORS-1494
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Vinay
>Assignee: Karl Wright
>Priority: Critical
>
> I am crawling a file system mounted on linux machine. So the Repository 
> Connection is of type "File System". For some files which has some special 
> characters, Manifold Cf is not picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:1.8.0_151]
>      at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
>      at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
>  ~[mcf-agents.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
>  ~[mcf-pull-agent.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
> [mcf-pull-agent.jar:?]
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
> string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356891#comment-16356891
 ] 

Vinay commented on CONNECTORS-1494:
---

Thanks Karl. Though the above solution partially fixes the issue, we still see 
that manifold cf is not picking the files with name like 
"a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf" when run from linux machine. No 
errors on the logs.

If the same file is copied to windows machine and run by manifold cf on 
windows, the file is picked up. Any idea why such files are not being picked up 
when running on linux? With no error on console, we are unable to figure out.

> Error crawling file system with file names having special characters.
> -
>
> Key: CONNECTORS-1494
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Vinay
>Assignee: Karl Wright
>Priority: Major
>
> I am crawling a file system mounted on linux machine. So the Repository 
> Connection is of type "File System". For some files which has some special 
> characters, Manifold Cf is not picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:1.8.0_151]
>      at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
>      at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
>  ~[mcf-agents.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
>  ~[mcf-pull-agent.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
> [mcf-pull-agent.jar:?]
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
> string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356817#comment-16356817
 ] 

Karl Wright commented on CONNECTORS-1494:
-

Hi, this problem is actually coming from the Document Filter transformation 
connector.  It's unable to parse the max size field you entered in the job 
pipeline configuration.  That's indeed a bug -- the connector should have given 
you an error when you tried to save it -- but it's easy to fix on your part.



> Error crawling file system with file names having special characters.
> -
>
> Key: CONNECTORS-1494
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Vinay
>Assignee: Karl Wright
>Priority: Major
>
> I am crawling a file system mounted on linux machine. So the Repository 
> Connection is of type "File System". For some files which has some special 
> characters, Manifold Cf is not picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:1.8.0_151]
>      at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
>      at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
>  ~[mcf-agents.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
>  ~[mcf-pull-agent.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
> [mcf-pull-agent.jar:?]
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
> string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1494:
---

Assignee: Karl Wright

> Error crawling file system with file names having special characters.
> -
>
> Key: CONNECTORS-1494
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Vinay
>Assignee: Karl Wright
>Priority: Major
>
> I am crawling a file system mounted on linux machine. So the Repository 
> Connection is of type "File System". For some files which has some special 
> characters, Manifold Cf is not picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:1.8.0_151]
>      at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
>      at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
>  ~[mcf-agents.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
>  ~[mcf-pull-agent.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
> [mcf-pull-agent.jar:?]
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
> string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated CONNECTORS-1494:
--
Description: 
I am crawling a file system mounted on linux machine. So the Repository 
Connection is of type "File System". For some files which has some special 
characters, Manifold Cf is not picking such files.

File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf

exception: java.lang.NumberFormatException: For input string: ""
     at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
~[?:1.8.0_151]
     at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
     at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
     at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
 ~[?:?]
     at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
 ~[?:?]
     at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
 ~[mcf-agents.jar:?]
     at 
org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
 ~[mcf-pull-agent.jar:?]
     at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
[mcf-pull-agent.jar:?]
 FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
string: ""

  was:
I am crawling a file system mounted on linux machine. So the Repository 
Connection is of type "File System". For some files which has some special 
characters, Manifold Cf is not picking such files.

File ex: 2GHz_XY-SCDMA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf

exception: java.lang.NumberFormatException: For input string: ""
    at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
~[?:1.8.0_151]
    at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
    at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
    at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
 ~[?:?]
    at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
 ~[?:?]
    at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
 ~[mcf-agents.jar:?]
    at 
org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
 ~[mcf-pull-agent.jar:?]
    at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
[mcf-pull-agent.jar:?]
FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
string: ""


> Error crawling file system with file names having special characters.
> -
>
> Key: CONNECTORS-1494
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Vinay
>Priority: Major
>
> I am crawling a file system mounted on linux machine. So the Repository 
> Connection is of type "File System". For some files which has some special 
> characters, Manifold Cf is not picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:1.8.0_151]
>      at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
>      at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
>  ~[?:?]
>      at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
>  ~[mcf-agents.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
>  ~[mcf-pull-agent.jar:?]
>      at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
> [mcf-pull-agent.jar:?]
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
> string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1494) Error crawling file system with file names having special characters.

2018-02-08 Thread Vinay (JIRA)
Vinay created CONNECTORS-1494:
-

 Summary: Error crawling file system with file names having special 
characters.
 Key: CONNECTORS-1494
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1494
 Project: ManifoldCF
  Issue Type: Bug
  Components: File system connector
Affects Versions: ManifoldCF 2.9.1
Reporter: Vinay


I am crawling a file system mounted on linux machine. So the Repository 
Connection is of type "File System". For some files which has some special 
characters, Manifold Cf is not picking such files.

File ex: 2GHz_XY-SCDMA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf

exception: java.lang.NumberFormatException: For input string: ""
    at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
~[?:1.8.0_151]
    at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151]
    at java.lang.Long.(Long.java:965) ~[?:1.8.0_151]
    at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513)
 ~[?:?]
    at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76)
 ~[?:?]
    at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503)
 ~[mcf-agents.jar:?]
    at 
org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47)
 ~[mcf-pull-agent.jar:?]
    at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) 
[mcf-pull-agent.jar:?]
FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input 
string: ""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)