SpanNotQuery.hashCode cut/paste error?

2006-05-16 Thread Chris Hostetter
SpanNodeQuery's hashCode method makes two refrences to include.hashCode(), but none to exclude.hashCode() ... this is a mistake yes/no? -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

[jira] Commented: (LUCENE-569) NearSpans skipTo bug

2006-05-16 Thread paul.elschot (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-569?page=comments#action_12411904 ] paul.elschot commented on LUCENE-569: - > I tried to make sense of the existing NearSpans implimentation over the > weekend ... i did not succeed. > I still haven't had a c

Re: OpenBitSet

2006-05-16 Thread eks dev
>Weird... I'm not sure how that could be. Are you sure you didn't get >the numbers reversed? that is exactly what happend, sorry for wrong numbers, now it looks as it should: java -version Java(TM) SE Runtime Environment (build 1.6.0-beta2-b83) Java HotSpot(TM) Client VM (build 1.6.0-beta2-b8

Re: Jira Convention: Resolved vs Closed

2006-05-16 Thread Erik Hatcher
I've historically treated Closed and Resolved as the same thing and have closed resolved issues just to set them to that state. Erik On May 15, 2006, at 9:24 PM, Chris Hostetter wrote: Is there a documented or unspoken policy about the "Resolved" vs "Closed" bug statuses? How/wh

Re: SpanNotQuery.hashCode cut/paste error?

2006-05-16 Thread Erik Hatcher
Yes, this is a mistake. I'm happy to fix it, but looks like you have other patches in progress. Erik On May 16, 2006, at 3:33 AM, Chris Hostetter wrote: SpanNodeQuery's hashCode method makes two refrences to include.hashCode(), but none to exclude.hashCode() ... this is a mistak

RE: Nio File Caching & Performance Test

2006-05-16 Thread Robert Engels
My tests still hold that the NioFile I submitted is significantly faster than the standard FSDirectory. BUT, the memory mapped implementation is significantly faster than NioFile. I attribute this to the overhead of managing the soft references, and possible GC interaction. SO, I would like to us

Re: Nio File Caching & Performance Test

2006-05-16 Thread Doug Cutting
Robert Engels wrote: SO, I would like to use a memory mapped reader, but I encounter OOM errors when mapping large files, due to running out of address space. Has anyone found a solution for this? (A 2 gig index is not all that large...). A 64-bit hardware, OS and JVM solves this nicely. On 3

Re: Nio File Caching & Performance Test

2006-05-16 Thread Yonik Seeley
On 5/16/06, Robert Engels <[EMAIL PROTECTED]> wrote: SO, I would like to use a memory mapped reader, but I encounter OOM errors when mapping large files, due to running out of address space. Pretty much all x86 servers sold are 64 bit capable now. Run a 64 bit OS if you can :-) Has anyone fou

Re: Jira Convention: Resolved vs Closed

2006-05-16 Thread Doug Cutting
Chris Hostetter wrote: How/when should a resolved bug be closed? I close bugs after their "fix version" is released. The distinction between "resolved" and "closed" is intended for projects with a formal QA process. An engineer fixes a bug and marks it "resolved", and then a tester verifies

Phrase IDF and collection frequency !

2006-05-16 Thread ABDOU Samir
Hi, Are there any ideas on how to compute the "document frequency" and "collection frequency" of phrases? Document frequency is the number of documents containing the phrase. Collection frequency is the frequency of the phrase in the whole collection. Thanks in advance for any help Sam

FieldsReader synchronized access vs. ThreadLocal ?

2006-05-16 Thread Robert Engels
In SegmentReader, currently the access to FieldsReader.doc(n) is synchronized (which is must be). Does it not make sense to use a ThreadLocal implementation similar to the TermInfosReader? It seems that in a highly multi-threaded server this synchronized method could lead to significant blocking

Re: OpenBitSet

2006-05-16 Thread Chris Hostetter
: I measured also on different densities, and it looks about the same. : When I find a few spare minutes will make one PerfTest that generates : gnuplot diagrams. Wold be interesting to see how all key methods behave : as a function of density/size. I was thinking the same thing ... i just haven'

query question

2006-05-16 Thread Dedian Guo
I am not sure if it is a question, could anybody tell me if the query syntax can do the select...from..where job as in traditional database? I have checked Lucene query syntax, but seems a little bit not too complex as SQL...correct me if wrong, or there is no such requirement for searching engine

Re: OpenBitSet

2006-05-16 Thread eks dev
Yeah, good hint. We actually made such measurements on TreeIntegerSet implementation, and it is totally astonishing what you get as a result (I remember 6Meg against 2k Memory consumption for "predominantly sorted bit vectors" like zip codes, conjuction/disjunct speed oreder of magnitude faster

Re: Nio File Caching & Performance Test

2006-05-16 Thread eks dev
Hi Robert, I might be easily wrong, but I beleive I saw something on JIRA (or was it bugzilla?) a long long time ago, where somebody made MMAP implementation for really big indexes that works on 32 bit. I guess it is worth checking it. - Original Message From: Yonik Seeley <[EMAIL PROT

java.lang.IndexOutOfBoundsException when querying Lucene

2006-05-16 Thread Alexandru Popescu
Hi! I am having quite a complex query that gets executed against the JCR content (that used Lucene for indexing/searching). From time to time I am seeing this exception: [trace] java.lang.IndexOutOfBoundsException: Index: 99, Size: 27 at java.util.ArrayList.RangeCheck(ArrayList.java:546)

RE: Nio File Caching & Performance Test

2006-05-16 Thread Robert Engels
The MMapDirectory works for really big indexes (larger than 2 gig), BUT if the JVM does not have enough address space (32 bit JVM)it will not work. -Original Message- From: eks dev [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 2:20 PM To: java-dev@lucene.apache.org Subject: Re: Nio

non indexed field searching?

2006-05-16 Thread Robert Engels
I know I've (and others have brought this up before), but maybe now with the lazy field loading (seemingly due to larger documents being stored) it is time to revisit. It seems that maybe a query could be separated into Filter and Query clauses (similar to how the query optimizer works in Nutch).

Re: FieldsReader synchronized access vs. ThreadLocal ?

2006-05-16 Thread Doug Cutting
Robert Engels wrote: It seems that in a highly multi-threaded server this synchronized method could lead to significant blocking when the documents are being retrieved? Perhaps, but I'd prefer to wait for someone to demonstrate this as a performance bottleneck before adding another ThreadLocal

Re: non indexed field searching?

2006-05-16 Thread Erik Hatcher
On May 16, 2006, at 3:37 PM, Robert Engels wrote: It seems that maybe a query could be separated into Filter and Query clauses (similar to how the query optimizer works in Nutch). Clauses that were based on non-indexed fields would be converted to a Filter. The problem is if you have some t

Re: Phrase IDF and collection frequency !

2006-05-16 Thread Tatu Saloranta
--- ABDOU Samir <[EMAIL PROTECTED]> wrote: > Hi, > > Are there any ideas on how to compute the "document > frequency" and "collection frequency" of phrases? Tokenize your input as phrases (instead of words), and you'll get this the same way you normally get stats for single-word tokens (Terms)?

Hacking Luke for bytecount-based strings

2006-05-16 Thread Marvin Humphrey
Greets, There does not seem to be a lot of demand for one implementation of Lucene to read indexes generated by another implementation of Lucene for the purposes of indexing or searching. However, there is a demand for index browsing via Luke. It occurred to me today that if Luke were po

RE: Hacking Luke for bytecount-based strings

2006-05-16 Thread Robert Engels
While you're at it, why not rewrite Luke in Perl as well... Seems like a great use of your time. -Original Message- From: Marvin Humphrey [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:36 PM To: java-dev@lucene.apache.org Cc: Andrzej Bialecki Subject: Hacking Luke for bytecount-

Re: Hacking Luke for bytecount-based strings

2006-05-16 Thread Paul Elschot
On Wednesday 17 May 2006 06:35, Marvin Humphrey wrote: > Greets, > > There does not seem to be a lot of demand for one implementation of > Lucene to read indexes generated by another implementation of Lucene > for the purposes of indexing or searching. However, there is a > demand for index