There seems to be some code out for Tika now (not packaged/announced yet,
but...). Could someone please take a look at it and see if that could fit
in? I am eagerly waiting for a reply back from tika-dev, but no luck yet.

http://svn.apache.org/repos/asf/incubator/tika/trunk/src/main/java/org/apache/tika/

I see that Eric's patch uses POI (for most of it)...so that's great! I have
seen too many duplicated efforts, even in Apache projects alone, and this is
one step close to fixing it (other than Tika, which isnt' 'complete' yet).
Are there any plans on releasing this patch with Solr dist? Or, any
instructions on using/installing the patch itself?

Thanks
Vish


On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote:
>
> Christian,
>
> Eric Pugh created implemented this functionality for a project we were
> doing and has released to code on JIRA.  We have had very good results
> with it.  If I can be of any help using it beyond the Java code itself
> let me know.  The last revision I used with it was 552853, so if the
> build happens to fail you can roll back to that and it will work.
>
> https://issues.apache.org/jira/browse/SOLR-284
>
> - Pete
>
> On 8/21/07, Christian Klinger <[EMAIL PROTECTED]> wrote:
> > Hi Solr Users,
> >
> > i have set up a Solr-Server with a custom Schema.
> > Now i have updated the index with some content form
> > xml-files.
> >
> > Now i try to update the contents of a folder.
> > The folder consits of various document-types
> > (pdf,doc,xls,...).
> >
> > Is there anywhere an howto how can i parse the
> > documents, make an xml of the paresed content
> > and post it to the solr server?
> >
> > Thanks in advance.
> >
> > Christian
> >
> >
>

Reply via email to