[
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725355#action_12725355
]
Grant Ingersoll commented on SOLR-284:
--------------------------------------
bq. ext.ignore.und.fl
I think this should be kept and this is a case where we should silently ignore.
Parsing rich data is a different beast than normal Solr XML or other
structured content. There are a lot of times where you only want to get
specific fields and there can be a large number of fields. It is burdensome to
have to add the ignores for all the metadata. Not to mention different types
may have different metadata. So, -1 on removing.
bq. ext.idx.attr
Yes, we may want it to be false. That's why I put it in! :-) It can be used
to extract things like HREF into other fields or not. Think faceting.
bq. ext.metadata.prefix
This is not a mapping thing so much as a way to separately handle metadata
fields from the main text fields. I'm not sure if it differs from the uprefix
approach you are proposing except you can know exactly what is metadata and
what isn't.
Other questions that Yonik brought up:
1. I don't think trying to auto map is a good idea. New file formats will have
new ways of doing them, it's better to have the user handle it.
2. Fine with dropping ext for common names
3. Metadata is often not useful and I don't think we need to do work as
suggested. See Eric's comment above.
4. Enabling by default is fine.
> Parsing Rich Document Types
> ---------------------------
>
> Key: SOLR-284
> URL: https://issues.apache.org/jira/browse/SOLR-284
> Project: Solr
> Issue Type: New Feature
> Components: update
> Reporter: Eric Pugh
> Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: libs.zip, rich.patch, rich.patch, rich.patch,
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch,
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch,
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip,
> test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into
> Solr.
> There is a wiki page with information here:
> http://wiki.apache.org/solr/UpdateRichDocuments
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.