RE: Untokenized URL

2008-07-07 Thread blazingwolf7
CTED] > >> -Original Message- >> From: blazingwolf7 [mailto:[EMAIL PROTECTED] >> Sent: Monday, July 07, 2008 9:15 AM >> To: java-dev@lucene.apache.org >> Subject: RE: Untokenized URL >> >> >> Well, I am open to suggestion, except for using

RE: Untokenized URL

2008-07-07 Thread Uwe Schindler
To: java-dev@lucene.apache.org > Subject: RE: Untokenized URL > > > Well, I am open to suggestion, except for using reader. The Documnet.get() > & > CO, how does it works? > > > Uwe Schindler wrote: > > > > As Shai told before, you should store the field twi

RE: Untokenized URL

2008-07-07 Thread blazingwolf7
; But this does not work with your TermEnum solution. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [EMAIL PROTECTED] > >> -Original Message- >> From: blazingwolf7 [mailto:[EMAIL PROTECTED] >> Sent: Mond

RE: Untokenized URL

2008-07-06 Thread Uwe Schindler
l Message- > From: blazingwolf7 [mailto:[EMAIL PROTECTED] > Sent: Monday, July 07, 2008 7:39 AM > To: java-dev@lucene.apache.org > Subject: Re: Untokenized URL > > > I am trying to retrieve the url and use it as filter. The main problem is > I > don't want to u

Re: Untokenized URL

2008-07-06 Thread blazingwolf7
I am trying to retrieve the url and use it as filter. The main problem is I don't want to use a reader to continuously retrieve the url for each document located. TermDocs termDocs = reader.termDocs(); TermEnum termEnum = reader.terms (new Term (field, "")); do{ Term term = termEnum.term(); }

Re: Untokenized URL

2008-07-05 Thread Shai Erera
I think that the simplest solution will be to index the URL field twice, once as TOKENIZED and once as UN_TOKENIZED. Then you can look up the un_tokenized term. If you have a document in hand and only want to fetch its URL, then add the URL twice, once as Store.NO, Index.TOKENIZED and once as Store

Re: Untokenized URL

2008-07-05 Thread blazingwolf7
No, I didn't store the contentLength. Just adding it into the index. Which until now I am still scratching my head as I can't think of another way to retrieve it without continuously using the reader. As for the url, I use doc.add(new Field("url", Store.NO,Index.TOKENIZED). I will like to keep it

Re: Untokenized URL

2008-07-04 Thread Shai Erera
Hi Regarding the contentLength, when you add it to the document, do you use *store* it as well (i.e., passing Store.YES or Store.COMPRESS)? Regarding the URL, how do you add it to the document? For example, if you do doc.add(new Field("url", "http://www.cnn.com";, Store.NO, Index.UN_TOKENIZED), i