Thanks. it looked like a very useful feature which would be worth reinstating if the security vulnerability could be patched.
-----Original Message----- From: Allison, Timothy B. [mailto:[email protected]] Sent: 12 September 2016 14:42 To: [email protected] Subject: RE: Query on correct use of 'fileUrl' in TikaJAXRS Server to extract document at remote url - my request is not working I think fileUrl only existed in 1.9. We removed it because it introduced a security vulnerability (http://www.openwall.com/lists/oss-security/2015/08/13/5). I just updated the wiki. Sorry! -----Original Message----- From: Sergey Beryozkin [mailto:[email protected]] Sent: Monday, September 12, 2016 6:27 AM To: [email protected] Subject: Re: Query on correct use of 'fileUrl' in TikaJAXRS Server to extract document at remote url - my request is not working Hi, can you give me a favor and paste a -v output ? -H identifies a request header, I wonder if it should be curl -i fileUrl:http://www.bbc.co.uk/news -H "Accept: text/plain" -X PUT http://localhost:9998/tika ? (though I've never used this option) Thanks, Sergey On 11/09/16 09:48, John Dougrez-Lewis wrote: > Apologies. There were a post-run copy-and-paste errors below to add > the URI formatting. > > > > Actual commands were: > > > > curl -i -H "fileUrl:http://www.bbc.co.uk/news" -H "Accept: text/plain" > -X PUT http://localhost:9998/tika > > HTTP/1.1 415 Unsupported Media Type > > Content-Length: 0 > > Server: Jetty(8.y.z-SNAPSHOT) > > > > > > curl -i -H "fileUrl:http://www.bbc.co.uk/news" -H "Accept: > application/json" -X PUT http://localhost:9998/meta > > HTTP/1.1 415 Unsupported Media Type > > Content-Length: 0 > > Date: Sun, 11 Sep 2016 08:46:30 GMT > > Server: Jetty(8.y.z-SNAPSHOT) > > > > > > > > *From:*John Dougrez-Lewis [mailto:[email protected]] > *Sent:* 11 September 2016 09:04 > *To:* [email protected] > *Subject:* Query on correct use of 'fileUrl' in TikaJAXRS Server to > extract document at remote url - my request is not working > > > > Hi, > > > > I'm trying to follow the instructions at the end of > http://wiki.apache.org/tika/TikaJAXRS to use extract a web page from a > remote website. > > > > I'd like to see the Tika results for the URL > *http://http://www.bbc.co.uk/news <http://http:/www.bbc.co.uk/news> *, > but when I run the following commands, I get the following errors: > > > > > > *curl -i -H "fileUrl:http://http://www.bbc.co.uk/news > <http://http:/www.bbc.co.uk/news>" -H "Accept: text/plain" -X PUT > http://localhost:9998/meta* > > > > HTTP/1.1 406 Not Acceptable > > Content-Length: 0 > > Date: Sun, 11 Sep 2016 07:40:24 GMT > > Server: Jetty(8.y.z-SNAPSHOT) > > > > > > *curl -i -H "fileUrl:http://http://www.bbc.co.uk/news > <http://http:/www.bbc.co.uk/news>" -H "Accept: text/plain" -X PUT > http://localhost:9998/meta* > > > > HTTP/1.1 415 Unsupported Media Type > > Content-Length: 0 > > Date: Sun, 11 Sep 2016 07:38:43 GMT > > Server: Jetty(8.y.z-SNAPSHOT) > > > > How do I correctly invoke curl to perform the PUT to the Tika Server > to get a valid response for the remote url ? > > > > I'm using: > > Apache Tika 1.13 Server > > curl 7.40.0 (i386-pc-win32) libcurl/7.40.0 OpenSSL/1.0.0o zlib/1.2.8 > > on Windows Server 2003 R2 sp2 > > > > Thanks, > > > > John > > >
