[ https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789300#action_12789300 ]
Akshay K. Ukey edited comment on SOLR-1358 at 12/11/09 1:23 PM: ---------------------------------------------------------------- Patch with test case and with tika parser configurable via parser attribute for entity tag. was (Author: akshay): Patch with test case, tika parser configurable via parser attribute for entity tag. > Integration of Tika and DataImportHandler > ----------------------------------------- > > Key: SOLR-1358 > URL: https://issues.apache.org/jira/browse/SOLR-1358 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler > Reporter: Sascha Szott > Assignee: Noble Paul > Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch, > SOLR-1358.patch > > > At the moment, it's impossible to configure Solr such that it build up > documents by using data that comes from both pdf documents and database table > columns. Currently, to accomplish this task, it's up to the user to add some > preprocessing that converts pdf files into plain text files. Therefore, I > would like to see an integration of Solr Cell into DIH that makes those > preprocessing obsolete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.