[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872161#comment-13872161 ] Mane commented on TIKA-1121: any update on this one? I can see that the snapshot version has a fix for files that were throwing server error. We are getting plenty of those with version 1.4. Can you check and release a patch? or is there any update on release date of Tika 1.5? Thanks Socket server text parsing error on large text files Key: TIKA-1121 URL: https://issues.apache.org/jira/browse/TIKA-1121 Project: Tika Issue Type: Bug Components: cli Affects Versions: 1.4 Environment: Ubuntu 10.04, 10.10, 12.04.02 Reporter: Dave Meikle Assignee: Dave Meikle As reported on the user list[1], when using the tika-app socket server command with the -t switch to parse text, the process hangs on large text files. This occurs on Ubuntu 10.04, 10.10 and 12.04.02. [1]http://mail-archives.apache.org/mod_mbox/tika-user/201305.mbox/%3ccagxbzufxsj4h5jwdeux9hhd2fxttq1vsbm7u-vfsyge9vmr...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845569#comment-13845569 ] Mane commented on TIKA-1121: I've tested with tika jax-rs server, it works with cases where it was sending me server error (tried with pdf files which are still returning Server Error on version 1.4). But it still hangs on very large html file when I used tika app in server mode (socket server). Socket server text parsing error on large text files Key: TIKA-1121 URL: https://issues.apache.org/jira/browse/TIKA-1121 Project: Tika Issue Type: Bug Components: cli Affects Versions: 1.4 Environment: Ubuntu 10.04, 10.10, 12.04.02 Reporter: Dave Meikle Assignee: Dave Meikle As reported on the user list[1], when using the tika-app socket server command with the -t switch to parse text, the process hangs on large text files. This occurs on Ubuntu 10.04, 10.10 and 12.04.02. [1]http://mail-archives.apache.org/mod_mbox/tika-user/201305.mbox/%3ccagxbzufxsj4h5jwdeux9hhd2fxttq1vsbm7u-vfsyge9vmr...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843289#comment-13843289 ] Sergey Beryozkin commented on TIKA-1121: That one seems to link to the latest one, I can see from a pom that it includes a CXF 2.7.8 dep, but I'm not sure the attachment related code is there too. Try it please. Socket server text parsing error on large text files Key: TIKA-1121 URL: https://issues.apache.org/jira/browse/TIKA-1121 Project: Tika Issue Type: Bug Components: cli Affects Versions: 1.4 Environment: Ubuntu 10.04, 10.10, 12.04.02 Reporter: Dave Meikle Assignee: Dave Meikle As reported on the user list[1], when using the tika-app socket server command with the -t switch to parse text, the process hangs on large text files. This occurs on Ubuntu 10.04, 10.10 and 12.04.02. [1]http://mail-archives.apache.org/mod_mbox/tika-user/201305.mbox/%3ccagxbzufxsj4h5jwdeux9hhd2fxttq1vsbm7u-vfsyge9vmr...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840252#comment-13840252 ] Sergey Beryozkin commented on TIKA-1121: Can you experiment with the latest snapshots ? You may have to build the tika-server manually. Use multipart/form-data payloads, though it might also do better even with the regular requests now as I've removed a call leading to creating a temp FileInputStream Socket server text parsing error on large text files Key: TIKA-1121 URL: https://issues.apache.org/jira/browse/TIKA-1121 Project: Tika Issue Type: Bug Components: cli Affects Versions: 1.4 Environment: Ubuntu 10.04, 10.10, 12.04.02 Reporter: Dave Meikle Assignee: Dave Meikle As reported on the user list[1], when using the tika-app socket server command with the -t switch to parse text, the process hangs on large text files. This occurs on Ubuntu 10.04, 10.10 and 12.04.02. [1]http://mail-archives.apache.org/mod_mbox/tika-user/201305.mbox/%3ccagxbzufxsj4h5jwdeux9hhd2fxttq1vsbm7u-vfsyge9vmr...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827739#comment-13827739 ] Mane commented on TIKA-1121: Is there any update on this? I am trying to parse large html/pdf files and tika socket server stops responding. Also, I tried Tika Jaxrs network app server which throws 500 Server Error. Can someone look into this please? Socket server text parsing error on large text files Key: TIKA-1121 URL: https://issues.apache.org/jira/browse/TIKA-1121 Project: Tika Issue Type: Bug Components: cli Affects Versions: 1.4 Environment: Ubuntu 10.04, 10.10, 12.04.02 Reporter: Dave Meikle Assignee: Dave Meikle As reported on the user list[1], when using the tika-app socket server command with the -t switch to parse text, the process hangs on large text files. This occurs on Ubuntu 10.04, 10.10 and 12.04.02. [1]http://mail-archives.apache.org/mod_mbox/tika-user/201305.mbox/%3ccagxbzufxsj4h5jwdeux9hhd2fxttq1vsbm7u-vfsyge9vmr...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827856#comment-13827856 ] Sergey Beryozkin commented on TIKA-1121: I can look at offering a CXF-specific extension to Tika JAX-RS Server to do with supporting multiparts. It should fix the issue. CXF is very effective in that it will store big attachments to the disk temp storage if needed. Socket server text parsing error on large text files Key: TIKA-1121 URL: https://issues.apache.org/jira/browse/TIKA-1121 Project: Tika Issue Type: Bug Components: cli Affects Versions: 1.4 Environment: Ubuntu 10.04, 10.10, 12.04.02 Reporter: Dave Meikle Assignee: Dave Meikle As reported on the user list[1], when using the tika-app socket server command with the -t switch to parse text, the process hangs on large text files. This occurs on Ubuntu 10.04, 10.10 and 12.04.02. [1]http://mail-archives.apache.org/mod_mbox/tika-user/201305.mbox/%3ccagxbzufxsj4h5jwdeux9hhd2fxttq1vsbm7u-vfsyge9vmr...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.1#6144)