[jira] [Commented] (TIKA-1202) Refactor PDFParser to enable easier parameter setting

2013-12-04 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838732#comment-13838732
 ] 

Hong-Thai Nguyen commented on TIKA-1202:


+1 for me.
Thanks

 Refactor PDFParser to enable easier parameter setting
 -

 Key: TIKA-1202
 URL: https://issues.apache.org/jira/browse/TIKA-1202
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.5
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Trivial
 Attachments: TIKA-1202.patch


 It would be handy to be able to set PDFParser parameters 
 (extractAnnotationText, etc) in a config file and via ParseContext.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (TIKA-1199) Tika extracts weird signs instead of text

2013-12-04 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838856#comment-13838856
 ] 

Tim Allison commented on TIKA-1199:
---

Doh!  Duplicated Marc's PDFBOX-1783.  Sorry about that.

 Tika extracts weird signs instead of text
 -

 Key: TIKA-1199
 URL: https://issues.apache.org/jira/browse/TIKA-1199
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.4
 Environment: MacOSX, Linux
Reporter: Marc Teutelink
 Attachments: gaat fout.pdf, 
 plain_text_tika_output_from_gaat_fout_pdf.txt, 
 structured_text_tika_output_from_gaat_fout_pdf.xml


 Tika extracts complete bogus text from the attached document. I have attached 
 the .PDF in question and also added the plain and structured text output from 
 Tika.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (TIKA-1196) JAX-RS server only responds to queries to/from http://localhost

2013-12-04 Thread Rian Stockbower (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839706#comment-13839706
 ] 

Rian Stockbower commented on TIKA-1196:
---

Radio silence from the Tika mailing list. Can we get my latest patch rolled in?

 JAX-RS server only responds to queries to/from http://localhost
 ---

 Key: TIKA-1196
 URL: https://issues.apache.org/jira/browse/TIKA-1196
 Project: Tika
  Issue Type: Bug
  Components: server
Affects Versions: 1.4
 Environment: Mac OS X, Windows Server 2008
Reporter: Rian Stockbower
Priority: Minor
  Labels: JAXRS, hostname, web-service
 Attachments: tika-1196.patch, tika-1196b.patch, tika-1196c.patch


 I'm not sure if this is a problem with the Tika JAX-RS server, or with how it 
 uses CXF under the hood. Anyway:
 I have a large text extraction job (10-15 million documents) that I'm using 
 the web service for. It would be nice to be able to distribute this 
 horizontally across multiple nodes to speed up the processing. I had thought 
 to have a job queue with a couple consumers, farming out PUT requests across 
 several Tika web service endpoints.
 But the JAX-RS web service will only respond to queries made to 
 {{http://localhost:9998/tika}}.
 I can't call {{http://hostname:9998/tika}} -- even if it's still a local 
 operation.
 Here is a list of things I've tried:
 * I changed line 89 of TikaServerCLI.java to compute the name of the host at 
 runtime. No go: the server starts up, and immediately terminates.
 * I changed line 89 of TikaServerCLI.java to be a hostname (not a FQDN), and 
 re-compiled:
 ** {{mvn compile -rf :tika-server}} compiles successfully. Start up the 
 server, and it terminates, just like when I tried to compute the hostname at 
 runtime
 ** {{mvn install}} from the topmost Tika directory gets the service 
 responding to both {{http://hostname:9998/tika}} and 
 {{http://hostname.domain.net:9998/tika}} (Seemed weird, this is why I was 
 thinking it was further up the chain in CXF?)
 In a perfect world:
 # The server should respond to any valid calls that make sense:
 #* 127.0.0.1
 #* localhost
 #* hostname
 #* host.domain.tld
 #* ip_address
 # A {{hostname}} invocation parameter could be used to limit what the service 
 responds to when it's started up. (A very optional, nice-to-have.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)