Re: Bug In IndexWriter.addDocument?

2008-07-07 Thread Ajay Lakhani
Dear Digy, To add on, I might think that this is not a glitch. A TokenStream is usually not stored. If you change your field attribute to * org.apache.lucene.document.Field.Store.NO *then there will be no issue. Developers, any thoughts on this! Cheers Ajay 2008/7/8 Ajay Lakhani <[EMAIL PROTEC

Re: Bug In IndexWriter.addDocument?

2008-07-07 Thread Ajay Lakhani
Dear Digy, As of Lucene 2.3, there are new setValue(...) methods that allow you to change the value of a Field. However, there seems to be an issue with the org.apache.lucene.index.FieldWriter.writeField(...) API that stores the string value for the field, which happens to be null in the case of a

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

2008-07-07 Thread Tavi Nathanson (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611453#action_12611453 ] Tavi Nathanson commented on LUCENE-794: --- Hey everyone, I'm having some trouble getti

Re: Fwd: ThreadLocal in SegmentReader

2008-07-07 Thread Adrian Tarau
Usually ThreadLocal.remove() should be called at the end(in a finally block), before the current call leaves your code. Ex : if during searching ThreadLocal is used, every search(..) method should cleanup any ThreadLocal variables, or even deeper in the implementation. When the call leaves Luc

Bug In IndexWriter.addDocument?

2008-07-07 Thread Digy
Hi all, I am a Lucene.Net user. Since I need a fast indexing in my current project I try to use Lucene 2.3.2 which I convert to .Net with IKVM(Since Lucene.Net is currently in v2.1) and I use the same instances of document and fields to gain some speed improvements. I use TokenStreams to se

Fwd: ThreadLocal in SegmentReader

2008-07-07 Thread Michael McCandless
ThreadLocal, which we use in several places in Lucene, causes a leak in app servers because the classloader never fully deallocates Lucene's classes because the ThreadLocal is holding strong references. Yet, ThreadLocal is very convenient for avoiding synchronization. Does anyone have any

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yonik Seeley
On Mon, Jul 7, 2008 at 5:03 PM, Yajun <[EMAIL PROTECTED]> wrote: > > I'm adding tons of logging, hopefully it will give me some information. Try capturing the directory contents before you take a snapshot... something like ls -l index > index/ls.txt Then if a missing file turns up, you can compar

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yajun
I'm adding tons of logging, hopefully it will give me some information. --Yajun Michael McCandless-2 wrote: > > > Ahh right your validation would catch the IndexWriter-still-open case. > > It seems like something external to Lucene is messing up your index. > It's odd. > > Mike > > Yaju

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Michael McCandless
Ahh right your validation would catch the IndexWriter-still-open case. It seems like something external to Lucene is messing up your index. It's odd. Mike Yajun wrote: Mike, Not very sure about whether IndexWriter is closed. The index update goes through Solr, I'll debug it tonight.

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yajun
Mike, Not very sure about whether IndexWriter is closed. The index update goes through Solr, I'll debug it tonight. Even if IndexWriter is not closed, since I "validate" the snapshot, at least that the snapshot should keep being "validated". :-) --Yajun Michael McCandless-2 wrote: > > > On

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Michael McCandless
One more question: are you sure that when you copy out your snapshot, the IndexWriter was closed? If IndexWriter is open when the copy is done, it's possible to get a corrupted copy. Mike Yajun wrote: I don't use deleteDocument nor setNorm with IndexReader. I tried both using hard l

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yajun
I don't use deleteDocument nor setNorm with IndexReader. I tried both using hard links (cp -lr) and copy (cp -r) to create snapshot, both has the same problem. It seems that segment file segments_xxx has segment that does not exist in the index directory anymore. I used to delete all the "inval

RE: maven snapshot repository

2008-07-07 Thread Steven A Rowe
On 07/04/2008 at 3:28 PM, Karl Wettin wrote: > The snapshots seems to be built every day, but I seems to be producing > some jars of a non-trunk revision or branch. Perhaps 2.3.2? I looked at MANIFEST.MF from lucene-core-2.3-SNAPSHOT.jar (the date for this file in the web page listing on the snap

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Michael McCandless
Yajun wrote: YL>> The lucene library comes with Solr. The jar file is lucene-core-2007-05-20_00-04-53.jar. I compared the source code of IndexReader which is close to Lucene 2.2, not 2.1 Hmmm OK thanks. This part is odd: When this happen, the program automatically tried to reopen the most

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yajun
My answer is inline. Michael McCandless-2 wrote: > > Yajun Liu (JIRA) <[EMAIL PROTECTED]> wrote: > > Can you double check which underlying version of Lucene you are > using? Those source file/line numbers don't line up to a stock 2.1 > release as far as I can tell. > > YL>> The lucene libra

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yajun
My bad, we don't use /tmp explicitly. We use /var/tmp/snapshot_timestamp which is not deleted by OS when reboot. --Yajun Robert Engels wrote: > > If your "automatic recycle" means a restart/reboot, the /tmp > directory is probably being cleared by the OS and you might have a > startup race

[jira] Updated: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted

2008-07-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1329: - Attachment: lucene-1329.patch lucene-1329.patch > Remove synchronization in SegmentRea

[jira] Created: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted

2008-07-07 Thread Jason Rutherglen (JIRA)
Remove synchronization in SegmentReader.isDeleted - Key: LUCENE-1329 URL: https://issues.apache.org/jira/browse/LUCENE-1329 Project: Lucene - Java Issue Type: Improvement Components:

[jira] Updated: (LUCENE-1314) IndexReader.reopen(boolean force)

2008-07-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1314: - Attachment: lucene-1314.patch lucene-1314.patch Implemented copy on write for norms and

Re: TokenStream#reset():boolean?

2008-07-07 Thread Michael McCandless
If we make this change (migrate to "boolean TokenStream.reset()"), what would IndexWriter do if it calls reset and false is returned? Are you saying that you want to get to the point where we force all TokenStream subclasses to implement reset (and return true)? Should we just eventually

Re: How effcient is IndexReader?

2008-07-07 Thread Michael McCandless
If the thing you are retrieving per doc is a stored field or a term vector, and you're talking about millions of docs, this will likely be too slow, unless your entire index can fit in the OS's IO cache. This use case is probably a good fit for column-stride fields: https://issues.apache

Re: about 2.4 release date?

2008-07-07 Thread Michael McCandless
It's not really clear at this point when 2.4 may be released. There hasn't been alot of discussion... That said, there are some important pending changes in 2.4 so I do think at some point soonish we should get a release out. Mike paulgao wrote: I would like to ask about the 2.4 vers

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Michael McCandless
Yajun Liu (JIRA) <[EMAIL PROTECTED]> wrote: Can you double check which underlying version of Lucene you are using? Those source file/line numbers don't line up to a stock 2.1 release as far as I can tell. This part is odd: > When this happen, the program automatically tried to reopen the most >

How effcient is IndexReader?

2008-07-07 Thread blazingwolf7
Hi, I want to use a Reader to read a document everytime a matching document is found during search time. So basically, everytime during the calculation of the score for a document, I will use the reader and retrieve some information from the index. Will this lower the searching performance? I m

RE: Untokenized URL

2008-07-07 Thread blazingwolf7
Thanks for the help Uwe Schindler wrote: > > Hi, > > Read here: http://wiki.apache.org/lucene-java/LuceneFAQ > > And I think that this type of questions is more for the Lucene Users > mailing > list > (http://lucene.apache.org/java/docs/mailinglists.html#Java%20User%20List). > This list is fo

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread robert engels
If your "automatic recycle" means a restart/reboot, the /tmp directory is probably being cleared by the OS and you might have a startup race condition. On Jul 7, 2008, at 2:17 AM, Yajun Liu (JIRA) wrote: FileNotFoundException in - Key: LUCENE-1328

RE: Untokenized URL

2008-07-07 Thread Uwe Schindler
Hi, Read here: http://wiki.apache.org/lucene-java/LuceneFAQ And I think that this type of questions is more for the Lucene Users mailing list (http://lucene.apache.org/java/docs/mailinglists.html#Java%20User%20List). This list is for developers of Lucene itself, not for users asking for help how

[jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yajun Liu (JIRA)
FileNotFoundException in - Key: LUCENE-1328 URL: https://issues.apache.org/jira/browse/LUCENE-1328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.1 Environment: OS

RE: Untokenized URL

2008-07-07 Thread blazingwolf7
Well, I am open to suggestion, except for using reader. The Documnet.get() & CO, how does it works? Uwe Schindler wrote: > > As Shai told before, you should store the field twice: As tokenized field > for your search and with a different name (e.g. "field-untokenized"). For > your TermEnum Code