[ 
https://issues.apache.org/jira/browse/TIKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633610#comment-14633610
 ] 

Tim Allison edited comment on TIKA-1690 at 7/20/15 1:43 PM:
------------------------------------------------------------

Is the problem {{is.available()}}?

{noformat}
        if(is.available() == 0 && !"".equals(fileUrl)){
            Metadata metadata = new Metadata();
            return TikaInputStream.get(new URL(fileUrl), metadata);
        }
{noformat}

If that's what's causing the problem, drop that and do a check for null on 
fileUrl, and we should be good to go (aside from our offline discussion), no?

In short, if someone sends in a fileUrl in the header, use that and ignore the 
inputstream.

But given that that portion seems to have nothing to do with the main patch, it 
might be best to revert it and open a separate ticket to add that functionality 
thoroughly (to all handlers) and add some other necessary items. 


was (Author: talli...@mitre.org):
Is the problem {{is.available()}}?

{noformat}
        if(is.available() == 0 && !"".equals(fileUrl)){
            Metadata metadata = new Metadata();
            return TikaInputStream.get(new URL(fileUrl), metadata);
        }
{noformat}

If that's what's causing the problem, drop that and do a check for null on 
fileUrl, and we should be good to go (aside from our offline discussion), no?

In short, if someone sends in a fileUrl in the header, use that and ignore the 
inputstream.

> Inconsistent (buggy) behavior when using tika-server 
> -----------------------------------------------------
>
>                 Key: TIKA-1690
>                 URL: https://issues.apache.org/jira/browse/TIKA-1690
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Namrata Malarout
>            Assignee: Tim Allison
>              Labels: memex, tika-python
>             Fix For: 1.10
>
>
> I am using Tika trunk (1.10-SNAPSHOT) and posting documents there. An example 
> would be the following:
> curl -T MOD09GA.A2014010.h30v12.005.2014012183944.vegetation_fraction.tif  
> http://localhost:9998/meta --header "Accept: application/json”
> …
> curl -T MOD09GA.A2014010.h30v12.005.2014012183944.vegetation_fraction.tif  
> http://localhost:9998/meta --header "Accept: application/rdf+xml”
> …
> curl -T MOD09GA.A2014010.h30v12.005.2014012183944.vegetation_fraction.tif  
> http://localhost:9998/meta --header "Accept: text/csv”
> I am using a python script to iterate through all the files in a folder. It 
> works for about 50% to 80% of the files. For the rest it gives an error 500. 
> When I post a file individually for which it previously failed (using the 
> python script) it sometimes works. When done in an ad hoc manner, it works 
> most of the time but fails sometimes. At times it is successful for 
> application/rdf+xml format but fails for application/json format. The 
> behavior is inconsistent.
> Here is an example trace of when it does not work as expected [0]
> A sample of the data being used can be found here [1]
> Any help would be appreciated. 
> [0] https://paste.apache.org/lbAm
> [1] 
> https://drive.google.com/file/d/0B6wmo4_-H0P2eWJjdTdtYS1HRGs/view?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to