[
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726123#action_12726123
]
Chris Harris commented on SOLR-284:
-----------------------------------
{quote}
bq. My only request is that, if you're changing how field mapping works and
maybe removing ext.ignore.und.fl, you make sure it stays easy to say, "Tika, I
don't care about any of your parsed metadata.
Map unknown fields to an ignored fieldtype.
uprefix=ignored_
{quote}
That seems fine.
Tangentially, I wonder how fast Tika's metadata extraction is, compared to its
main body text extraction. If the latter doesn't dwarf the former, there might
be value in adding a "Solr, don't even ask Tika to calculate for metadata at
all; just have it extract the body text" flag; this could potentially speed
things up for people that don't need the metadata. Maybe it would make sense to
benchmark things before adding such a flag, though. I also don't have a good
sense of how many people will want to use the metadata feature vs how many
don't.
> Parsing Rich Document Types
> ---------------------------
>
> Key: SOLR-284
> URL: https://issues.apache.org/jira/browse/SOLR-284
> Project: Solr
> Issue Type: New Feature
> Components: update
> Reporter: Eric Pugh
> Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: libs.zip, rich.patch, rich.patch, rich.patch,
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch,
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch,
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip,
> test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into
> Solr.
> There is a wiki page with information here:
> http://wiki.apache.org/solr/UpdateRichDocuments
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.