[jira] Updated: (LUCENE-857) Remove BitSet caching from QueryFilter

2007-04-09 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-857: Attachment: LUCENE-857.refactoring-approach.diff An example of what I'm thinking would make sense from a ba

[jira] Commented: (LUCENE-857) Remove BitSet caching from QueryFilter

2007-04-09 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487679 ] Hoss Man commented on LUCENE-857: - I don't think it's a question of being careless about reading the Changelog -- I

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487675 ] Marvin Humphrey commented on LUCENE-584: DisjunctionSumScorer (the ORScorer) actually calls Scorer.score() on

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487674 ] Otis Gospodnetic commented on LUCENE-584: - A. I'll look at the patch again tomorrow and follow what you

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487667 ] Doron Cohen commented on LUCENE-584: > No Scorer, no BooleanScorer(2), no ConjunctionScorer... Thanks, I was re

[jira] Updated: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

2007-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: spanhighlighter5.patch Apologize for the delay on this -- I was pulled into a busy produc

Re: Large scale sorting

2007-04-09 Thread Paul Smith
A memory saving optimization would be to not load the corresponding String[] in the string index (as discussed previously), but there is currently no way to tell the FieldCachethat the strings are unneeded. The String values are only needed for merging results in a MultiSearcher. Yep, which hap

Re: Large scale sorting

2007-04-09 Thread Yonik Seeley
On 4/9/07, jian chen <[EMAIL PROTECTED]> wrote: But, on a higher level, my idea is really just to create an array of integers for each sort field. The array length is NumOfDocs in the index. Each integer corresponds to a displayable string value. For example, if you have a field of different colo

Re: Large scale sorting

2007-04-09 Thread jian chen
Hi, Paul, I think to warm-up or not, it needs some benchmarking for specific application. For the implementation of the sort fields, when I talk about norms in Lucene, I am thinking we could borrow the same implmentation of the norms to do it. But, on a higher level, my idea is really just to c

Re: Large scale sorting

2007-04-09 Thread Paul Smith
In our application, we have to sync up the index pretty frequently, the warm-up of the index is killing it. Yep, it speeds up the first sort, but at the cost of making all the others slower (maybe significantly so). That's obviously not ideal but could make use of sorts in larger index

Re: Large scale sorting

2007-04-09 Thread jian chen
Hi, Paul, Thanks for your reply. For your previous email about the need for disk based sorting solution, I kind of agree about your points. One incentive for your approach is that we don't need to warm-up the index anymore in case that the index is huge. In our application, we have to sync up th

[jira] Commented: (LUCENE-859) Expose the number of deleted docs in index/segment

2007-04-09 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487644 ] Yonik Seeley commented on LUCENE-859: - > Though it might still be handy to have something with main() that spits

Re: Large scale sorting

2007-04-09 Thread Doug Cutting
Paul Smith wrote: I don't disagree with the premise that it involves substantial I/O and would increase the time taken to sort, and why this approach shouldn't be the default mechanism, but it's not too difficult to build a disk I/O subsystem that can allocate many spindles to service this and

Re: Large scale sorting

2007-04-09 Thread Paul Smith
Now, if we could use integers to represent the sort field values, which is typically the case for most applications, maybe we can afford to have the sort field values stored in the disk and do disk lookup for each document matched? The look up of the sort field value will be as simple as

Re: Large scale sorting

2007-04-09 Thread Paul Smith
On 10/04/2007, at 4:18 AM, Doug Cutting wrote: Paul Smith wrote: Disadvantages to this approach: * It's a lot more I/O intensive I think this would be prohibitive. Queries matching more than a few hundred documents will take several seconds to sort, since random disk accesses are requir

[jira] Commented: (LUCENE-859) Expose the number of deleted docs in index/segment

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487640 ] Otis Gospodnetic commented on LUCENE-859: - Though it might still be handy to have something with main() that

[jira] Closed: (LUCENE-859) Expose the number of deleted docs in index/segment

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic closed LUCENE-859. --- Resolution: Won't Fix Lucene Fields: [New, Patch Available] (was: [Patch Available, Ne

[jira] Commented: (LUCENE-859) Expose the number of deleted docs in index/segment

2007-04-09 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487635 ] Yonik Seeley commented on LUCENE-859: - Isn't this redundant with existing IndexReader methods? deletedDocs() ==

Re: Large scale sorting

2007-04-09 Thread jian chen
Hi, Doug, I have been thinking about this as well lately and have some thoughts similar to Paul's approach. Lucene has the norm data for each document field. Conceptually it is a byte array with one byte for each document field. At query time, I think the norm array is loaded into memory the fir

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487631 ] Otis Gospodnetic commented on LUCENE-584: - Doron: just to address your question from Apr/7 - I expect/hope to

[jira] Updated: (LUCENE-859) Expose the number of deleted docs in index/segment

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated LUCENE-859: Attachment: LUCENE-859 El patcho. > Expose the number of deleted docs in index/segment >

[jira] Created: (LUCENE-859) Expose the number of deleted docs in index/segment

2007-04-09 Thread Otis Gospodnetic (JIRA)
Expose the number of deleted docs in index/segment -- Key: LUCENE-859 URL: https://issues.apache.org/jira/browse/LUCENE-859 Project: Lucene - Java Issue Type: New Feature Components:

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487617 ] Doron Cohen commented on LUCENE-848: Seems okay to me (since it's all in the benchmark). > Add supported for Wik

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487616 ] Doron Cohen commented on LUCENE-584: > > When you rerun, you may want to use my alg - to compare the two approach

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Mike Klaas (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487613 ] Mike Klaas commented on LUCENE-584: --- Instead of discarding the first run, the approach I usually take is to run 3-4

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487609 ] Steven Parkes commented on LUCENE-848: -- That's what I meant (and did). If it's okay, I'll bundle it into 848.

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487608 ] Doron Cohen commented on LUCENE-848: > Also, I was going to add support to the algorithm format for setting max

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487600 ] Steven Parkes commented on LUCENE-848: -- By the way, that's a rough patch. I'm cleaning it up as I use it to test

[jira] Updated: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2007-04-09 Thread Andy Liu (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Liu updated LUCENE-855: Attachment: TestRangeFilterPerformanceComparison.java Here's my new benchmark. > MemoryCachedRangeFilter t

[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2007-04-09 Thread Andy Liu (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487595 ] Andy Liu commented on LUCENE-855: - In your updated benchmark, you're combining the range filter with a term query th

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487594 ] Yonik Seeley commented on LUCENE-584: - > When you rerun, you may want to use my alg - to compare the two approach

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt This patch is a first cut a wikipedia benchmark support. It downloads

[jira] Assigned: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic reassigned LUCENE-855: --- Assignee: Otis Gospodnetic > MemoryCachedRangeFilter to boost performance of Range qu

[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487590 ] Otis Gospodnetic commented on LUCENE-855: - Comments about the patch so far: Cosmetics: - You don't want to re

Re: optimize() method call

2007-04-09 Thread Doug Cutting
Otis Gospodnetic wrote: I'd advise against calling optimize() at all in an environment whose indices are constantly updated. +1 Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECT

Re: Large scale sorting

2007-04-09 Thread Doug Cutting
Paul Smith wrote: Disadvantages to this approach: * It's a lot more I/O intensive I think this would be prohibitive. Queries matching more than a few hundred documents will take several seconds to sort, since random disk accesses are required per matching document. Such an approach is only

Re: Progressive Query Relaxation

2007-04-09 Thread J. Delgado
The idea is to efficiently get the desired result set (top N) at once without having to re-run different queries inside the application logic. Query relaxation avoids having several round trips and possibly could be offered with and without deduplication. Maybe this is a feature required for Solr

[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487587 ] Otis Gospodnetic commented on LUCENE-855: - OK. I'll wait for the new performance numbers before committing.

[jira] Resolved: (LUCENE-853) Caching does not work when using RMI

2007-04-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved LUCENE-853. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, Ne

[jira] Updated: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2007-04-09 Thread Matt Ericson (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Ericson updated LUCENE-855: Attachment: FieldCacheRangeFilter.patch This version will create a real BitSet() when cloned and wi

Re: Progressive Query Relaxation

2007-04-09 Thread Otis Gospodnetic
Not that I know of. One typically puts that in application logic and re-runs or offers to run alternative queries. No de-duping there, unless you do it in your app. I think one problem with the described approach and Lucene would be that Lucene's scores are not "absolute". Otis . . . . . .

[jira] Created: (LUCENE-858) link from Lucene web page to API docs

2007-04-09 Thread Daniel Naber (JIRA)
link from Lucene web page to API docs - Key: LUCENE-858 URL: https://issues.apache.org/jira/browse/LUCENE-858 Project: Lucene - Java Issue Type: Improvement Reporter: Daniel Naber Assi

Re: linking the API docs

2007-04-09 Thread Grant Ingersoll
Hi Daniel, Can you file this as an issue and assign it to me? Nigel and I are working through a few things w/ Hudson and the docs, still. The gist of it is that the API and website will be put back on people.a.o. This will mean that a relative link like api/overview- summary.html#overvi

Progressive Query Relaxation

2007-04-09 Thread J. Delgado
Has anyone within the Lucene or Solr community attempted to code a progressive query relaxation technique similar to the one described here for Oracle Text? http://www.oracle.com/technology/products/text/htdocs/prog_relax.html Thanks, -- J.D.