Re: Tika 2.0???

2014-12-18 Thread Nick Burch
On Thu, 18 Dec 2014, Allison, Timothy B. wrote: I feel Tika 2.0 coming up soon (well, April-ish?!) and the breaking of some other areas of back compat, esp. parser class loading -> config ... Is it work creating a wiki page, and use that to track things we want to break compatibility with + ba

Re: Outputting JSON from tika-server/meta

2014-12-18 Thread Allison, Timothy B.
Peter, I'm waiting on feedback on TIKA-1497, but rmeta should get you what you want via TIKA-1498. Let us know if there are any surprises. Best, Tim -Original Message- From: Tim Allison (JIRA) [mailto:j...@apache.org] Sent: Thursday, December 18, 20

Re: Tika 2.0???

2014-12-18 Thread Chris Mattmann
+1 to everything below. My biggest near term goal is 1.7 and we need an answer to integration + metadata on that. Then I think we can address the TODOs including back incompat ones potentially for 2.0. Cheers Tim. Cheers, Chris Chris Mattmann chris.mattm...@gmail.com

RE: Outputting JSON from tika-server/meta

2014-12-18 Thread Allison, Timothy B.
Doh! K, looks like we aren’t loading that in TikaServerCLI. Does anyone know how we’re using MetadataEP? From: Peter Bowyer [mailto:pe...@mapledesign.co.uk] Sent: Thursday, December 18, 2014 10:57 AM To: user@tika.apache.org Subject: Re: Outputting JSON from tika-server/meta On 18 December 2014

RE: Outputting JSON from tika-server/meta

2014-12-18 Thread Allison, Timothy B.
Ha, yes, that is on my ever growing list of todos. That is slightly different, though, from metadata so I’d want to add a separate endpoint. Does the format you get with the –J option on tika-app from 1.7-SNAPSHOT work for you? From: Peter Bowyer [mailto:pe...@mapledesign.co.uk] Sent: Thursd

Re: Outputting JSON from tika-server/meta

2014-12-18 Thread Peter Bowyer
On 18 December 2014 at 15:20, Allison, Timothy B. wrote: > > Do you have any luck if you call /metadata instead of /meta? > I have no luck with that: Dec 18, 2014 3:55:21 PM org.apache.cxf.jaxrs.utils.JAXRSUtils findTargetMethod WARNING: No operation matching request path "/metadata" is found, R

Re: Outputting JSON from tika-server/meta

2014-12-18 Thread Peter Bowyer
If the API is being modified, could we add an endpoint which will return a combined JSON output, like: { "meta" : { ... }, "content" : { "string of content" } } This would save me making two API calls, fetching each individually and loading the document twice. /unpack does something similar,

Tika 2.0???

2014-12-18 Thread Allison, Timothy B.
I feel Tika 2.0 coming up soon (well, April-ish?!) and the breaking of some other areas of back compat, esp. parser class loading -> config ... What other areas for breaking or revamping do others see for 2.0? We need a short-term fix to get the tesseract ocr integration+metadata out the door

Re: Outputting JSON from tika-server/meta

2014-12-18 Thread Chris Mattmann
Yeah I think we should probably combine them..and make JSON the default (which unfortunately would break back compat, but in my mind would make a lot more sense) Chris Mattmann chris.mattm...@gmail.com -Original Message- From: "Allison, Timothy B." Reply-To:

RE: Outputting JSON from tika-server/meta

2014-12-18 Thread Allison, Timothy B.
Do you have any luck if you call /metadata instead of /meta? That should trigger MetadataEP which will return Json, no? I'm not sure why we have both handlers, but we do... -Original Message- From: Sergey Beryozkin [mailto:sberyoz...@gmail.com] Sent: Thursday, December 18, 2014 9:56 AM

Re: Outputting JSON from tika-server/meta

2014-12-18 Thread Sergey Beryozkin
Hi Peter Thanks, you are too nice, it is a minor bug :-) Cheers, Sergey On 18/12/14 14:50, Peter Bowyer wrote: Thanks Sergey, I have opened TIKA-1497 for this enhancement. Best wishes, Peter On 18 December 2014 at 14:31, Sergey Beryozkin mailto:sberyoz...@gmail.com>> wrote: Hi, I see M

Re: Outputting JSON from tika-server/meta

2014-12-18 Thread Peter Bowyer
Thanks Sergey, I have opened TIKA-1497 for this enhancement. Best wishes, Peter On 18 December 2014 at 14:31, Sergey Beryozkin wrote: > > Hi, > I see MetadataResource returning StreamingOutput and it has > @Produces(text/csv) only. As such this MBW has no effect at the moment. > > We can update

RE: getLanguage returns "lt" if pdf-file contains only images

2014-12-18 Thread Ken Krugler
Hi Sven, From your email below, it seems like you get 2 characters per page - can you provide details on what those are? Thanks, -- Ken > From: Krüger, Sven > Sent: June 25, 2014 6:22:52am PDT > To: user@tika.apache.org > Subject: getLanguage returns "lt" if pdf-file contains only images > >

Re: Outputting JSON from tika-server/meta

2014-12-18 Thread Sergey Beryozkin
Hi, I see MetadataResource returning StreamingOutput and it has @Produces(text/csv) only. As such this MBW has no effect at the moment. We can update MetadataResource to return Metadata directly if application/json is requested or update MetadataResource to directly convert Metadata to JSON i

Outputting JSON from tika-server/meta

2014-12-18 Thread Peter Bowyer
Hi, I suspect this has a really simple answer, but it's eluding me. How do I get the response from curl -X PUT -T /path/to/file.pdf http://localhost:9998/meta to be JSON and not CSV? I've discovered JSONMessageBodyWriter.java ( https://github.com/apache/tika/blob/af19f3ea04792cad81b428f1df9f5ebb