[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-10 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463830 ] Doron Cohen commented on LUCENE-675: Oops... I had the impression that compiling with compliance level 1.4 is su

[jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-10 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-675: --- Attachment: byTask.jre1.4.patch.txt > Lucene benchmark: objective performance test for Lucene > -

committing for ego-reasons

2007-01-10 Thread karl wettin
To make it easier for me to keep up to date with the trunk, it would be very nice if I got issue 550 committed. Any committer with too much time that would like to throw an eye at it? It has been running live in multiple heavy- and low load environments for quite some time now, so it ought

[jira] Reopened: (LUCENE-741) Field norm modifier (CLI tool)

2007-01-10 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic reopened LUCENE-741: - The norm-removing functionality was bogus - it simply "normalized the norms" to be 1 for the

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-10 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463792 ] Grant Ingersoll commented on LUCENE-675: Hey Doron, Your patch uses JDK 1.5. I am assuming it is safe to u

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463787 ] Yonik Seeley commented on LUCENE-769: - Sorry for some of the redundant comments... Chucks comment wasn't visible

Lockless commits -- great stuff!

2007-01-10 Thread Marvin Humphrey
Greets, I've finished integrating the lockless commits concept into KinoSearch, and I wanted to pop in and say that it's a very nice piece of work. Real outside-the-box thinking -- or at least outside my box. :) Nothing better than an innovation which solves long- standing problems AND

[jira] Commented: (LUCENE-140) docs out of order

2007-01-10 Thread Jed Wesley-Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463781 ] Jed Wesley-Smith commented on LUCENE-140: - Michael, Doron, you guys are legends! Indeed the problem is using

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
The idea of "impact" and "impact-sorted posting list" should practically work with boolean model by approximation in the following way: (1) Index Structure Inverted-Index : * posting-list: + (sorted by impact) occurrence: position (2) Retrieval Algorithm for boolean query "a AND b" set an impa

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463761 ] Yonik Seeley commented on LUCENE-769: - Those performance numbers don't make sense to me. Why would DocCaching sor

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
A little bit correction: Impact does not have to be per occurrence of a term, but rather most likely per aggregation of all occurrences of a term in a document (per pair of term and doc). Thus you just aggregate the significance of occurrences in different regions of a doc at index time and put the

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
Just my two cents, I think what he meant by "single field" is the following: If the concept of "field" was introduced to differentiate the significance of term occurrences in difference regions of a document, (eg, the occurence in title is more important than in body, etc), that significance can b

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
I have a couple of questions about the original post of the new index design: (1) Question on the posting list > > f. ,],...[docN, freq > > > > ,]) What is the "impact" per posting list? I am under the impression that "impact" or "frequency" is per pair of doc and term. And it seem that "impact

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread jian chen
Hi, Jeff, Also, how to handle the phrase based queries? For example, here are two posting lists: TermA: X Y TermB: Y X I am not sure how you would return document X or Y for a search of the phrase "TermA Term B". Which should come first? Thanks, Jian On 1/9/07, Dalton, Jeffery <[EMAIL PROTE

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread jian chen
Hi, Jeff, I like the idea of impact based scoring. However, could you elaborate more on why we only need to use single field at search time? In Lucene, the indexed terms are field specific, and two terms, even if they are the same, are still different terms if they are of different fields. So,

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Chuck Williams (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463729 ] Chuck Williams commented on LUCENE-769: --- The test case uses only tiny documents, and the reported timings for m

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463727 ] Hoss Man commented on LUCENE-769: - Artem: I've only skimmed your patch breifly, but i have a few comments: 1) since

Re: Lucene Scalability Question

2007-01-10 Thread robert engels
It appears the submitter is working at solving all of these issue - basically a pluggable index. You should review his emails on the subject. On Jan 10, 2007, at 3:03 PM, J. Delgado wrote: This sounds very interesting... I'll defenitely have a look into it. However I have the feeling that, l

Re: Lucene Scalability Question

2007-01-10 Thread J. Delgado
This sounds very interesting... I'll defenitely have a look into it. However I have the feeling that, like the use of Oracle Text, this is keeping seperate the underlying data structures used for evaluating full-text and conditions over other data types, which brings up other issues when trying to

Re: Lucene Scalability Question

2007-01-10 Thread robert engels
There is a module in Lucene contrib that changes that! It loads Lucene into the Oracle database (it has a JVM), and allows Lucene syntax to perform full-text searching. On Jan 10, 2007, at 2:37 PM, J. Delgado wrote: No, Oracle Text does not use Lucene. It has its own proprietary full-text e

Re: Lucene Scalability Question

2007-01-10 Thread Grant Ingersoll
On Jan 10, 2007, at 3:37 PM, J. Delgado wrote: No, Oracle Text does not use Lucene. It has its own proprietary Someone has contributed an code that allows Lucene to be run inside of Oracle's JVM, this is different from Oracle Text. Search the User/ Dev list for recent posts on Oracle and

Re: Lucene Scalability Question

2007-01-10 Thread Steven Rowe
J. Delgado wrote: > I'm looking to hear new ideas people may have to solve this very hard > problem. https://issues.apache.org/jira/browse/LUCENE-724 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [

Re: Lucene Scalability Question

2007-01-10 Thread J. Delgado
No, Oracle Text does not use Lucene. It has its own proprietary full-text engine. It represents documents, the inverted index and relationships in a DB schema and it depends heavily on the SQL layer. This has some severe limitations though... Of course, you can push structured data into full-text

Re: Lucene Scalability Question

2007-01-10 Thread robert engels
I think the contrib 'Oracle Full Text' does this (although in the reverse). It uses Lucene for full text queries (embedded into the db), the query analyzer works. It is really a great piece of software. Do bad it can't be done in a standard way so that it would work with all dbs. I thin

Re: Lucene Scalability Question

2007-01-10 Thread J. Delgado
This is a more general question: Given the fact that most applications require querying a combination of full-text and structured data has anyone looked into building data structures at the most fundamental level (e.g. combination of b-tree and inverted lists) that would enable scalable and perf

Re: Lucene Scalability Question

2007-01-10 Thread Chris Hostetter
: So you mean lucene can't do better than this ? robert's point is that based on what you've told us, there is no reason to think Lucene makes sense for you -- if *all* you are doing is finding documents based on numeric rnages, then a relational database is petter suited to your task. if you ac

Re: Lucene Scalability Question

2007-01-10 Thread Ali Salehi
So you mean lucene can't do better than this ? Best, On Wed, 10 Jan 2007 20:53:33 +0100, robert engels <[EMAIL PROTECTED]> wrote: I think you need a database, not Lucene - especially since you are not even using any text ! On Jan 10, 2007, at 1:39 PM, Ali Salehi wrote: Hi, Thanks for

Re: Lucene Scalability Question

2007-01-10 Thread robert engels
I think you need a database, not Lucene - especially since you are not even using any text ! On Jan 10, 2007, at 1:39 PM, Ali Salehi wrote: Hi, Thanks for your previous mail. Now I changed the configuration to use merging factor 50. I also disabled the compound file parameter. I still h

[jira] Updated: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Vasiliev updated LUCENE-769: -- Attachment: DocCachingSorting.patch > [PATCH] Performance improvement for some cases of sorted

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463682 ] Artem Vasiliev commented on LUCENE-769: --- fork="true" maxmemory="105m" attributes need to be added to task for

Re: Lucene Scalability Question

2007-01-10 Thread Ali Salehi
Hi, Thanks for your previous mail. Now I changed the configuration to use merging factor 50. I also disabled the compound file parameter. I still have the search problem I had before, now search takes around 750 msecs for a small set of documents. [java] Total Query Processing time (msec) :

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463678 ] Artem Vasiliev commented on LUCENE-769: --- Also tried this with DOCS_NUM == 1,000,000. Right after index creation

[jira] Commented: (LUCENE-140) docs out of order

2007-01-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463674 ] Michael McCandless commented on LUCENE-140: --- > BTW. We have looked at all the open files referenced by the

[jira] Commented: (LUCENE-542) QueryParser doesn't support keywords staring with *

2007-01-10 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463659 ] Steven Parkes commented on LUCENE-542: -- Actually, this has been changed as of the commit of LUCENE-489. It's dis

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-10 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463652 ] Artem Vasiliev commented on LUCENE-769: --- I've tried the test with DOCS_NUM == 10,000,000. DocCaching sort took

Re: [Fwd: Re: BUILD for lucene (#45) on localhost TIMED OUT: Timed out after 60 minutes and was stopped.]

2007-01-10 Thread Slava Imeshev
Michael, Sorry for the confusion. Please disregard this error. We are in process of migrating to better hardware so this should get fixed soon. Thank you. Slava --- Michael McCandless <[EMAIL PROTECTED]> wrote: > Grant Ingersoll wrote: > > This is not our nightly build. I'm not sure who is r

Re: [Fwd: Re: BUILD for lucene (#45) on localhost TIMED OUT: Timed out after 60 minutes and was stopped.]

2007-01-10 Thread Slava Imeshev
Michael, Sorry for the confusion. Please disregard this error. We are in process of migrating to better hardware so this should get fixed soon. Thank you. Slava --- Michael McCandless <[EMAIL PROTECTED]> wrote: > > Forwarding to java-dev for help on what we should do about this > particular

[jira] Resolved: (LUCENE-767) maxDoc should be explicitly stored in the index, not derived from file length

2007-01-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-767. --- Resolution: Fixed Fix Version/s: 2.1 > maxDoc should be explicitly stored in t

[jira] Commented: (LUCENE-542) QueryParser doesn't support keywords staring with *

2007-01-10 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463564 ] Steven Rowe commented on LUCENE-542: Hi Jianwu, See the FAQ entry: http://wiki.apache.org/jakarta-lucene/LuceneF

Re: [Fwd: Re: BUILD for lucene (#45) on localhost TIMED OUT: Timed out after 60 minutes and was stopped.]

2007-01-10 Thread Michael McCandless
Grant Ingersoll wrote: This is not our nightly build. I'm not sure who is running this. The Lucene nightly build should email java-dev if there is a problem. It seems to have gone through fine last night, except the odd thing is the file formats change did not go through, despite it being in

Re: [Fwd: Re: BUILD for lucene (#45) on localhost TIMED OUT: Timed out after 60 minutes and was stopped.]

2007-01-10 Thread Grant Ingersoll
This is not our nightly build. I'm not sure who is running this. The Lucene nightly build should email java-dev if there is a problem. It seems to have gone through fine last night, except the odd thing is the file formats change did not go through, despite it being in SVN. I am going t

[Fwd: Re: BUILD for lucene (#45) on localhost TIMED OUT: Timed out after 60 minutes and was stopped.]

2007-01-10 Thread Michael McCandless
Forwarding to java-dev for help on what we should do about this particular build failure. Apparently on a build timeout, only the individuals who had committed that day are emailed? But on other build failures I think java-dev is emailed? It seems to me like the machine doing the build is ov

[jira] Commented: (LUCENE-140) docs out of order

2007-01-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463524 ] Michael McCandless commented on LUCENE-140: --- OK from that indexing-failure.log (thanks Jed!) I can see that

Re: Payloads

2007-01-10 Thread Nadav Har'El
On Mon, Jan 08, 2007, Nicolas Lalev×™e wrote about "Re: Payloads": > I have looked closer to how lucene index, and I realized that for the facet > feature, the kind of payload handling by Michael's patch are not designed for > that. In this patch, the payloads are in the posting, ie in the tis, fr