RE: Query on correct use of 'fileUrl' in TikaJAXRS Server to extract document at remote url - my request is not working

John Dougrez-Lewis Mon, 12 Sep 2016 11:53:47 -0700

Thanks. it looked like a very useful feature which would be worth
reinstating if the security vulnerability could be patched.



-----Original Message-----
From: Allison, Timothy B. [mailto:[email protected]] 
Sent: 12 September 2016 14:42
To: [email protected]
Subject: RE: Query on correct use of 'fileUrl' in TikaJAXRS Server to
extract document at remote url - my request is not working

I think fileUrl only existed in 1.9.  We removed it because it introduced a
security vulnerability
(http://www.openwall.com/lists/oss-security/2015/08/13/5).  

I just updated the wiki.  Sorry!


-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]] 
Sent: Monday, September 12, 2016 6:27 AM
To: [email protected]
Subject: Re: Query on correct use of 'fileUrl' in TikaJAXRS Server to
extract document at remote url - my request is not working

Hi, can you give me a favor and paste a -v output ?

-H identifies a request header, I wonder if it should be

curl -i fileUrl:http://www.bbc.co.uk/news -H "Accept: text/plain" -X PUT
http://localhost:9998/tika

?
(though I've never used this option)

Thanks, Sergey

On 11/09/16 09:48, John Dougrez-Lewis wrote:
> Apologies. There were a post-run copy-and-paste errors below to add 
> the URI formatting.
>
>
>
> Actual commands were:
>
>
>
> curl -i -H "fileUrl:http://www.bbc.co.uk/news"; -H "Accept: text/plain"
> -X PUT http://localhost:9998/tika
>
> HTTP/1.1 415 Unsupported Media Type
>
> Content-Length: 0
>
> Server: Jetty(8.y.z-SNAPSHOT)
>
>
>
>
>
> curl -i -H "fileUrl:http://www.bbc.co.uk/news"; -H "Accept:
> application/json" -X PUT http://localhost:9998/meta
>
> HTTP/1.1 415 Unsupported Media Type
>
> Content-Length: 0
>
> Date: Sun, 11 Sep 2016 08:46:30 GMT
>
> Server: Jetty(8.y.z-SNAPSHOT)
>
>
>
>
>
>
>
> *From:*John Dougrez-Lewis [mailto:[email protected]]
> *Sent:* 11 September 2016 09:04
> *To:* [email protected]
> *Subject:* Query on correct use of 'fileUrl' in TikaJAXRS Server to 
> extract document at remote url - my request is not working
>
>
>
> Hi,
>
>
>
> I'm trying to follow the instructions at the end of 
> http://wiki.apache.org/tika/TikaJAXRS to use extract a web page from a 
> remote website.
>
>
>
> I'd like to see the Tika results for the URL 
> *http://http://www.bbc.co.uk/news <http://http:/www.bbc.co.uk/news> *, 
> but when I run the following commands, I get the following errors:
>
>
>
>
>
> *curl -i  -H "fileUrl:http://http://www.bbc.co.uk/news
> <http://http:/www.bbc.co.uk/news>"  -H "Accept: text/plain" -X PUT
> http://localhost:9998/meta*
>
>
>
> HTTP/1.1 406 Not Acceptable
>
> Content-Length: 0
>
> Date: Sun, 11 Sep 2016 07:40:24 GMT
>
> Server: Jetty(8.y.z-SNAPSHOT)
>
>
>
>
>
> *curl -i  -H "fileUrl:http://http://www.bbc.co.uk/news
> <http://http:/www.bbc.co.uk/news>"  -H "Accept: text/plain" -X PUT
> http://localhost:9998/meta*
>
>
>
> HTTP/1.1 415 Unsupported Media Type
>
> Content-Length: 0
>
> Date: Sun, 11 Sep 2016 07:38:43 GMT
>
> Server: Jetty(8.y.z-SNAPSHOT)
>
>
>
> How do I correctly invoke curl to perform the PUT to the Tika Server 
> to get a valid response for the remote url ?
>
>
>
> I'm using:
>
> Apache Tika 1.13 Server
>
> curl 7.40.0 (i386-pc-win32) libcurl/7.40.0 OpenSSL/1.0.0o zlib/1.2.8
>
> on Windows Server 2003 R2 sp2
>
>
>
> Thanks,
>
>
>
> John
>
>
>

RE: Query on correct use of 'fileUrl' in TikaJAXRS Server to extract document at remote url - my request is not working

Reply via email to