Re: Use of tika for parsing, offsets questions

2009-09-04 Thread David Causse
On Thu, Sep 03, 2009 at 03:07:18PM +0200, Jukka Zitting wrote: > Hi, > > On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote: > > If I use tika for parsing HTML code and inject parsed String to a lucene > > analyzer. What about the offset information for KWIC and return to text > > (like the google

Re: Use of tika for parsing, offsets questions

2009-09-03 Thread Grant Ingersoll
On Sep 2, 2009, at 5:40 AM, David Causse wrote: Hi, If I use tika for parsing HTML code and inject parsed String to a lucene analyzer. What about the offset information for KWIC and return to text (like the google cache view)? how can I keep track of the offsets between tika parser and lu

RE: Use of tika for parsing, offsets questions

2009-09-03 Thread Uwe Schindler
.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Jukka Zitting [mailto:jukka.zitt...@gmail.com] > Sent: Thursday, September 03, 2009 3:07 PM > To: java-user@lucene.apache.org; David Causse > Subject: Re: Use of t

Re: Use of tika for parsing, offsets questions

2009-09-03 Thread Jukka Zitting
Hi, On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote: > If I use tika for parsing HTML code and inject parsed String to a lucene > analyzer. What about the offset information for KWIC and return to text > (like the google cache view)? how can I keep track of the offsets > between tika parser and