[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650353#action_12650353
 ] 

Hoss Man commented on SOLR-284:
-------------------------------

bq. if Tika returns a metadata field and you haven't made an explicit mapping 
from the Tika fieldname to your Solr fieldname, then Solr will throw an 
exception and your document add will fail. This doesn't seem sound very robust 
for a production environment, unless Tika will only ever use a finite list of 
metadata field names.

I'm not familiar with the state of the patch, but i'm assuming that (by 
default) all of the metadata fields produced by tika have a common naming 
convention -- either in terms of a common prefix or a common suffix.  in which 
case people can always make a dynamicField declaration to ignore all metadata 
fields not already explicitly declared.

> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284.patch, 
> SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, test-files.zip, 
> test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to