[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724855#action_12724855
 ] 

Yonik Seeley commented on SOLR-284:
-----------------------------------

Not sure if I should open a new issue or keep improvements here.
I think we need to improve the OOTB experience with this...
http://search.lucidimagination.com/search/document/302440b8a2451908/solr_cell

Ideas for improvement:
- auto-mapping names of the form Last-Modified to a more solrish field name 
like last_modified
- drop "ext." from parameter names, and revisit naming to try and unify with 
other update handlers like CSV
  note: in the future, one could see generic functionality like boosting 
fields, setting field value defaults, etc, being handled by a generic component 
or update processor... all the better reason to drop the ext prefix.
-  I imagine that metadata is normally useful, so we should
  1. predefine commonly used metadata fields in the example schema... there's 
really no cost to this
  2. use mappings to normalize any metadata names (if such normalization isn't 
already done in Tika)
  3. ignore or drop fields that have little use
  4. provide a way to handle new attributes w/o dropping them or throwing an 
error
- enable the handler by default - lazy to avoid a dependency on having all the 
tika libs available


> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, 
> test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to