On Nov 12, 2008, at 12:56 PM, Shalin Shekhar Mangar wrote:

I think the best way would be a TikaEntityProcessor which knows how to
handle documents. I guess a typical use-case would be
FileListEntityProcessor->TikaEntityProcessor as parent-child entities.

Also see SOLR-833 which adds a FieldReaderDataSource using which you can pass any field's content to an entity for processing. So you can have a
[SqlEntityProcessor, JdbcDataSource] producing a blob and a
[FieldReaderDataSource, TikaEntityProcessor] consuming it.

I think such an integration will be very interesting. Let me know if you
need a hand, I'm willing to contribute in whatever way possible.


OK, likely I will focus on getting SOLR-284 done, then we can figure out how to integrate, refactor, extract commonalities for DIH. So, for now just keep an eye on, and test 284.

Reply via email to