Hi
On 28/03/12 06:24, Mattmann, Chris A (388J) wrote:
Hi Folks,
Sorry to bother, but I am having a heck of time migrating away from Jersey to
CXF in
Tika-ville. Please see TIKA-593 for a reference.
No problems, thanks for posting this information
Right now, here:
http://svn.apache.org/repos/asf/tika/trunk/tika-server
I have all tests passing, having migrated away from using Jersey and onto CXF,
save one test:
Xtest415 in UnpackageResourceTest
(I disabled it for now since it won't pass).
In our old Jersey setup, accessible from ViewVC here:
http://s.apache.org/jKU
You can see the former Jersey test harness for test415 that passed fine before.
It seems to have something to do with:
1. Accept parameter, and this thread:
http://s.apache.org/LMR
Apparently Jersey allowed us to not have to specify */*, which I think it used
by default.
2. Jersey's behavior on the unknown "xxx/xxx" MediaType, which I don't think it
let
through (and threw back HTTP 415), but for whatever reason, CXF lets through and
it gets to the Tika AutoDetectParser, which barfs.
UnpackerResource does not have any specific requirements about the
incoming Content-Type, it has no specific @Consumes.
415 is returned if Content-Type is not supported but right now
UnpackerResource has effectively @Consumes("*/*"). I would not be
surprised if Jersey also assumed the default wildcard, otherwise, given
that UnpackerResource does not have explicit @Consumes, it would not be
possible to post/put to it with Jersey clients.
"xxx/xxx" is a valid MediaType format so CXF parses it and lets through.
I think it really boils down to whether "xxx/xxx" can be treated as a
valid media type or not. We can add a check for "xxx/xxx" but then
someone will set "text/123" - that is acceptable enough but I guess not
something the tika server wants to accept :-)
I guess I can update UnpackagerResource to guard against this and throw HTTP
415 itself, but it seems a bit intrusive and I was really hoping for more of a
seamless
Jar/Maven drop in fix.
I think the proper fix is to explicitly list supported content-types,
using wildcards when possible, example image/*, or a/b+*, etc
Any guidance or help or context, in Tika ville or here, would be welcomed.
Please feel
free to CC me directly on replies, [email protected], and [email protected]
since we are not subscribed to the CXF users list.
CC-ed,
Thanks, Sergey
Thanks!
Cheers,
Chris
P.S. Thanks for the help in IRC today Sergey
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--
Sergey Beryozkin
Talend Community Coders
http://coders.talend.com/
Blog: http://sberyozkin.blogspot.com