[jira] Commented: (SOLR-284) Parsing Rich Document Types

Kristoffer Dyrkorn (JIRA) Tue, 08 Apr 2008 03:08:14 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586735#action_12586735
 ]


Kristoffer Dyrkorn commented on SOLR-284:
-----------------------------------------

Very handy!

It could be beneficial to have an option to save the extracted text as xml (so 
it can be stored) just before adding it to the Solr index. Thus, if the Solr 
schema needs to be changed (in a way that triggers a full reindex) the content 
can then be quickly re-fed from a "near source".

> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Eric Pugh
>             Fix For: 1.3
>
>         Attachments: libs.zip, rich.patch, source.zip, test-files.zip, 
> test.zip
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-284) Parsing Rich Document Types

Reply via email to