There seems to be some code out for Tika now (not packaged/announced yet, but...). Could someone please take a look at it and see if that could fit in? I am eagerly waiting for a reply back from tika-dev, but no luck yet.
http://svn.apache.org/repos/asf/incubator/tika/trunk/src/main/java/org/apache/tika/ I see that Eric's patch uses POI (for most of it)...so that's great! I have seen too many duplicated efforts, even in Apache projects alone, and this is one step close to fixing it (other than Tika, which isnt' 'complete' yet). Are there any plans on releasing this patch with Solr dist? Or, any instructions on using/installing the patch itself? Thanks Vish On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > Christian, > > Eric Pugh created implemented this functionality for a project we were > doing and has released to code on JIRA. We have had very good results > with it. If I can be of any help using it beyond the Java code itself > let me know. The last revision I used with it was 552853, so if the > build happens to fail you can roll back to that and it will work. > > https://issues.apache.org/jira/browse/SOLR-284 > > - Pete > > On 8/21/07, Christian Klinger <[EMAIL PROTECTED]> wrote: > > Hi Solr Users, > > > > i have set up a Solr-Server with a custom Schema. > > Now i have updated the index with some content form > > xml-files. > > > > Now i try to update the contents of a folder. > > The folder consits of various document-types > > (pdf,doc,xls,...). > > > > Is there anywhere an howto how can i parse the > > documents, make an xml of the paresed content > > and post it to the solr server? > > > > Thanks in advance. > > > > Christian > > > > >