[jira] Created: (SOLR-1847) Solrj doesn't know if PDF was actually parsed by Tika

elsadek (JIRA) Fri, 26 Mar 2010 01:51:08 -0700

Solrj doesn't know if PDF was actually parsed by Tika
-----------------------------------------------------


                 Key: SOLR-1847
                 URL: https://issues.apache.org/jira/browse/SOLR-1847
             Project: Solr
          Issue Type: Bug
          Components: contrib - Solr Cell (Tika extraction)
    Affects Versions: 1.5
         Environment: TOMCAT 6.0.24, SOLR 1.5Dev, Solrj1.5Dev Tika
            Reporter: elsadek


When posting pdf files using solrj the only response we get from Solr is only 
server response status, but never know whether
pdf was actually parsed or not, checking the log I found that  Tika wasn't able
to succeed with some pdf files because of content nature (texts in images only) 
or are corrupted:
    
     25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine 
processOperator
     INFO: unsupported/disabled operation: EI
   
     25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
     GRAVE: Stop reading corrupt stream


The question is how can I catch these kinds of exceptions through Solrj ?



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1847) Solrj doesn't know if PDF was actually parsed by Tika

Reply via email to