Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doron Cohen
Doug Cutting wrote: > > Therefore, a "semi compound" segment file can be defined, that would be > > made of 4 files (instead of 1): > > - File 0: .fdx .tis .tvx > > - File 1: .fdt .tii .tvd > > - File 2: .frq .tvf > > - File 3: .fnm .prx .fN > > I think this is a promising direction. Perhaps inste

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doron Cohen
Doug Cutting wrote: > Doug Cutting wrote: > > Yes. On 32-bit systems with indexes larger than 1GB or so, memory > > mapping is impractical, so synchronization is required around shared > > file handles (using Java's classic i/o APIs, w/o pread). The > > non-compound format, with more files, has f

Re: gdata-server roadmap and open items

2006-12-16 Thread Simon Willnauer
Hey Garth, that is a perfect moment to join. Well I'm on vacation at the moment but I will be back in early jannuary. Can you keep your fingers still until I'm back? I will give you all information you want / need. Would that be ok for you? If not there is a lot of stuff to do with the GData Objec

Re: Lucene nightly build failure

2006-12-16 Thread Grant Ingersoll
This is caused by the same java.io.tempdir problem with TestFieldsReader.testLazyPerformance. I have been working on nighly builds on lucene.zones (Lucene 708) and I am afraid my testing of the script is conflicting with Doug's nightly cron job when it comes to testing. I think we should

Lucene nightly build failure

2006-12-16 Thread java-dev
javacc-uptodate-check: javacc-notice: [echo] [echo] One or more of the JavaCC .jj files is newer than its corresponding [echo] .java file. Run the "javacc" target to regenerate the artifacts. [echo] init: clover.setup: [mkdir] Created dir: /tmp/lucen

Lucene nightly build failure

2006-12-16 Thread java-dev
javacc-uptodate-check: javacc-notice: [echo] [echo] One or more of the JavaCC .jj files is newer than its corresponding [echo] .java file. Run the "javacc" target to regenerate the artifacts. [echo] init: clover.setup: clover.info: [echo] [ec

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Doug Cutting wrote: Yes. On 32-bit systems with indexes larger than 1GB or so, memory mapping is impractical, so synchronization is required around shared file handles (using Java's classic i/o APIs, w/o pread). The non-compound format, with more files, has fewer synchronization bottlenecks.

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Doug Cutting wrote: > I'm not yet convinced that the costs of this mid-point justify its > benefits. That was too negative. Let me try a more positive angle. Doron Cohen wrote: Therefore, a "semi compound" segment file can be defined, that would be made of 4 files (instead of 1): - File 0: .fd

[jira] Resolved: (LUCENE-744) TestFieldsReader - TestLazyPerformance problems w/ permissions in temp dir in multiuser environment

2006-12-16 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-744?page=all ] Grant Ingersoll resolved LUCENE-744. Resolution: Fixed OK, per Chris' suggestion, I changed this to use tempDir. > TestFieldsReader - TestLazyPerformance problems w/ permissions in temp di

[jira] Resolved: (LUCENE-721) Code coverage reports

2006-12-16 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-721?page=all ] Grant Ingersoll resolved LUCENE-721. Resolution: Fixed Committed the change. Linked to the reports from the Resources -> Developers page on the documentation. Current report was generate

[jira] Resolved: (LUCENE-713) File Formats Documentation is not correct for Term Vectors

2006-12-16 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-713?page=all ] Grant Ingersoll resolved LUCENE-713. Resolution: Fixed Put in wording to account for offset and position info in the TVF file. Changes have been committed and website updated (allow time f

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Marvin Humphrey wrote: Out of curiosity, does the non-compound format yield any search-time benefits? Yes. On 32-bit systems with indexes larger than 1GB or so, memory mapping is impractical, so synchronization is required around shared file handles (using Java's classic i/o APIs, w/o pread)

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Marvin Humphrey
On Dec 15, 2006, at 2:04 PM, Otis Gospodnetic wrote: I think Doron is right on the money here. I know one "customer" who'd be happy to trade its file descriptors for less IO - simpy.com. It's exactly what Doron describes - a busy system with a LOT of indices. File descriptors are kept u

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Otis Gospodnetic wrote: I think Doron is right on the money here. I know one "customer" who'd be happy to trade its file descriptors for less IO - simpy.com. It's exactly what Doron describes - a busy system with a LOT of indices. File descriptors are kept under control by carefully closing

Re: 15 minute hang in IndexInput.clone() involving finalizers

2006-12-16 Thread Yonik Seeley
On 12/16/06, Chuck Williams <[EMAIL PROTECTED]> wrote: Applying the patch moved the issue somewhat, but not materially. The setup of the FieldCache comparator still takes the same amount of time and all thread dumps still find the stack inside Object.clone() working on finalizers. Oops, I forg

[jira] Updated: (LUCENE-750) don't use finalizers for FSIndexInput clones

2006-12-16 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-750?page=all ] Yonik Seeley updated LUCENE-750: Attachment: IndexInput_finalizer.patch Forgot to remove the finalizer from FSIndexInput in the first patch. Note that I also removed the finalizer from FSIndexO

Re: Lucene code review

2006-12-16 Thread Erik Hatcher
On Dec 16, 2006, at 3:44 AM, Chris Hostetter wrote: : what they were). Solr had cross-site scripting issues in its JSP : pages, which I think are now all fixed (?). SOLR-74, just resolved. I don't know if i'd really call them XSS issues: they are on the admin pages; if a malicious user has ac

Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-16 Thread robert engels
I can basically GUARANTEE that unless you are using large indexes and loading into a RamDirectory (and use Java prior to 1.5) , that there is NO ISSUE in using a ThreadLocal, There is something else wrong in your application. PLEASE get a good profiler and perform the required testing. It is

Re: 15 minute hang in IndexInput.clone() involving finalizers

2006-12-16 Thread Paul Elschot
Chuck, On Saturday 16 December 2006 09:06, Chuck Williams wrote: > ... > I can see reading 1 million terms and building the comparator taking a > while, although not the 15-20 minutes it does, and am baffled at how > every thread dump on many trials of this issue end up with every one > inside th

Re: Lucene code review

2006-12-16 Thread Chris Hostetter
: what they were). Solr had cross-site scripting issues in its JSP : pages, which I think are now all fixed (?). SOLR-74, just resolved. I don't know if i'd really call them XSS issues: they are on the admin pages; if a malicious user has access to them, you've got bigger problems then them try

Re: 15 minute hang in IndexInput.clone() involving finalizers

2006-12-16 Thread Chuck Williams
The problem appears to be this. We have an approximately 1 million item index. It uses 6 parallel subindexes with ParallelReader, so each of these subindexes has 1 million items. Each subindex has the same segment structure, with 15 segments in each at the moment. I mentioned before that the is