RE: CHANGES questions

2009-09-21 Thread Uwe Schindler
> I've been reading through CHANGES.txt and had a few questions/comments: > > 1. The attribute entry still says Token is deprecated. I can fix, but > isn't a huge deal. Another one? +1 for changing. > 2. L-1658 talks about changing FSDirectory for SimpleDirectory and > adds a static open() meth

Re: CHANGES questions

2009-09-21 Thread Michael McCandless
On Sun, Sep 20, 2009 at 7:40 PM, Mark Miller wrote: > Mark Miller wrote: > > Something along the lines of: > >  * LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory >    (but left an FSDirectory base class).  Added an FSDirectory.open >    static method to pick a good default FSDir

RE: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Uwe Schindler
And inline in your diff we have the deprecated Token class: > * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class > called > AttributeSource instead of the now deprecated Token class. All > attributes > that the Token class had have been moved into separate classes: > @

Re: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Mark Miller
Uwe Schindler wrote: > And inline in your diff we have the deprecated Token class: > > >> * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class >> called >> AttributeSource instead of the now deprecated Token class. All >> attributes >> that the Token class had have bee

RE: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Uwe Schindler
This was the answer about your first commit (merge FSDir stuff). At the time I posted the answer, you fixed the deprecated Token thing :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller [mai

Re: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Mark Miller
I see what you mean! My first change had the dep token piece in the diff. Thats a funny coincidence. Through me for a loop. Uwe Schindler wrote: > This was the answer about your first commit (merge FSDir stuff). At the time > I posted the answer, you fixed the deprecated Token thing :-) > > -

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Grant Ingersoll wrote: > > On Sep 17, 2009, at 3:07 PM, Mark Miller wrote: > >> So in the section: Building the Release artifacts >> >> bullet 8: Make sure that for each release file an md5 checksum file >> exists. >> >> At this step in the process, the zip/tars do not have an md5 checksum >> file

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Yonik Seeley wrote: > On Thu, Sep 17, 2009 at 6:42 PM, Mark Miller wrote: > >> Okay - I see the checksum stuff in build.xml - it just doesn't appear to >> be working in my favor at the moment ... >> > > You can just use md5sum from the command line too. > > -Yonik > http://www.lucidimagina

RE: ReleaseTodo steps

2009-09-21 Thread Uwe Schindler
> Oddly though, while all of the Maven hashes are in a file thats 32bytes, > when I save this hash, its 33bytes. > > Any thoughts? Line feed? - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional comma

Re: ReleaseTodo steps

2009-09-21 Thread John Wang
Hi Guys: A quick comment on 2.9 release: org.apache.lucene.Weight interface has been changed to an abstract class. This is a non-backward compatible change and would break many custom Query implementations. Is this intentional? Thanks -John On Mon, Sep 21, 2009 at 8:59 PM, Uwe Schindler wr

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Uwe Schindler wrote: >> Oddly though, while all of the Maven hashes are in a file thats 32bytes, >> when I save this hash, its 33bytes. >> >> Any thoughts? >> > > Line feed? > > > - > To unsubscribe, e-mail: java-dev-unsubscr.

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Yeah it is, sorry :( Check out the back compat break section in changes - its the first section I think. John Wang wrote: > Hi Guys: > > A quick comment on 2.9 release: > > org.apache.lucene.Weight interface has been changed to an abstract > class. This is a non-backward compatible change and

TermCount per fiend

2009-09-21 Thread John Wang
Hi guys: Not sure if this would be a better fit on the users or the dev list. It would be very useful to be able to get term count given a field, e.g. int IndexReader.termCount(String field) Wanted to get your opinion on what is the best way to approach this. After looking th

Re: ReleaseTodo steps

2009-09-21 Thread John Wang
Thanks Mark for the clarification! -John On Mon, Sep 21, 2009 at 9:09 PM, Mark Miller wrote: > Yeah it is, sorry :( > > Check out the back compat break section in changes - its the first > section I think. > > John Wang wrote: > > Hi Guys: > > > > A quick comment on 2.9 release: > > > > org.

Re: ReleaseTodo steps

2009-09-21 Thread Yonik Seeley
On Mon, Sep 21, 2009 at 8:56 AM, Mark Miller wrote: > Have you done this before Yonik? > > md5sum generates a hash line like this: > a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz Remove the '*' character? >1. Lucene 2.4.1 doesn't seem to have these md5 hashes for the non maven > artifact

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Thanks! I assumed you dropped the second part entirely, because the Maven artifact md5's only appear to have the hash. Your link to the dist with the non Maven md5's clears that up though. I guess the mirrors just don't have the md5 files. bq. All of the old releases used to be there, but they wer

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Yonik Seeley wrote: > On Mon, Sep 21, 2009 at 8:56 AM, Mark Miller wrote: > >> Have you done this before Yonik? >> >> md5sum generates a hash line like this: >> a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz >> > > Remove the '*' character? > Oddly, my version of md5sum considers

[no subject]

2009-09-21 Thread Thomas D'Silva
I would like to contribute a class based on the MoreLikeThis class in contrib/queries that generates a query based on the tags associated with a document. The class assumes that documents are tagged with a set of tags (which are stored in the index in a seperate Field). The class determines the top

2.9 vote

2009-09-21 Thread Mark Miller
Uploading 2.9 vote candidate as I type. Gonna check it out a bit more after the upload too, but when its up, I *think* we are ready to begin the vote process. I'll send out an official vote start email a bit later (I've got to CC the general mailing list as well). Hopefully I haven't screwed up

[jira] Commented: (LUCENE-1910) Extension to MoreLikeThis to use tag information

2009-09-21 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757924#action_12757924 ] Mark Harwood commented on LUCENE-1910: -- Hi Thomas, Following your request for feedbac

RE: svn commit: r817286 - /lucene/java/site/docs/doap.rdf

2009-09-21 Thread Steven A Rowe
Mark Miller wrote: > > > +Lucene 2.9.0 > +2009-09-23 > +2.9.0 > + Stupid question from the peanut gallery: Doesn't a VOTE require 3 days? I ask because (3 + 2009-09-21) = 2009-09-24, not -23. Steve

Re: svn commit: r817286 - /lucene/java/site/docs/doap.rdf

2009-09-21 Thread Mark Miller
Steven A Rowe wrote: > Mark Miller wrote: > >> >> >> +Lucene 2.9.0 >> +2009-09-23 >> +2.9.0 >> + >> > > Stupid question from the peanut gallery: > > Doesn't a VOTE require 3 days? I ask because (3 + 2009-09-21) = 2009-09-24, > not -23. > > Steve

Re: svn commit: r817286 - /lucene/java/site/docs/doap.rdf

2009-09-21 Thread Yonik Seeley
On Mon, Sep 21, 2009 at 12:45 PM, Mark Miller wrote: > I actually almost sent an email questioning it, but the day is supposed > to be an estimate, so I figure its likely to be off a day or two anyway. +1, don't worry about it. Need to wait for mirrors to sync anyway, so it's often the day after

[VOTE] Release Lucene 2.9.0

2009-09-21 Thread Mark Miller
Okay, lets give this a shot: The (proposed) release artifacts have been built and are up at: http://people.apache.org/~markrmiller/staging-area/lucene2.9/ The changes are here: http://people.apache.org/~markrmiller/staging-area/lucene2.9changes/ Please vote to officially release these artifac

[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757961#action_12757961 ] Michael McCandless commented on LUCENE-1781: bq. Can we go with my patch, and

[jira] Created: (LUCENE-1921) Absurdly large radius (miles) search fails to include entire earth

2009-09-21 Thread Michael McCandless (JIRA)
Absurdly large radius (miles) search fails to include entire earth -- Key: LUCENE-1921 URL: https://issues.apache.org/jira/browse/LUCENE-1921 Project: Lucene - Java Issue Type:

[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757963#action_12757963 ] Michael McCandless commented on LUCENE-1781: I opened LUCENE-1921. > Large di

Re: TermCount per fiend

2009-09-21 Thread Michael McCandless
MultiReaders can't quickly compute the exact term count. Would they be allowed to throw UOE? (Like IndexReader.getUniqueTermCount) TermsHashPerField.numPostings (not .numPostingsInt) tells you the # unique terms currently in IndexWriter's RAM buffer, so I think we could save that out with FieldI

[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757972#action_12757972 ] Michael McCandless commented on LUCENE-1781: Mark is it OK to commit this now?

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread Jason Rutherglen
John, It would be great if Lucene's benchmark were used so everyone could execute the test in their own environment and verify. It's not clear the settings or code used to generate the results so it's difficult to draw any reliable conclusions. The steep spike shows greater evidence for the IO ca

[jira] Commented: (LUCENE-1917) ShingleFilter include words

2009-09-21 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758056#action_12758056 ] Jason Rutherglen commented on LUCENE-1917: -- I'm going to port SOLR-908 rather tha

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
Jason: Before jumping into any conclusions, let me describe the test setup. It is rather different from Lucene benchmark as we are testing high updates in a realtime environment: We took a public corpus: medline, indexed to approximately 3 million docs. And update all the docs over and over

Re: [jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Mark Miller
+1 - commit away. - Mark http://www.lucidimagination.com (mobile) On Sep 21, 2009, at 2:08 PM, "Michael McCandless (JIRA)" > wrote: [ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757972#action_1

Welcome, Koji

2009-09-21 Thread Michael McCandless
A warm welcome to our newest Lucene contrib committer, Koji Sekiguchi! Koji has given us the FastVectorHighlighter and CharFilter, among other fun things. He's also a committer in Solr. Welcome aboard! Mike - To unsubscribe, e

Re: [jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless
Super, will do! Mike On Mon, Sep 21, 2009 at 7:52 PM, Mark Miller wrote: > +1 - commit away. > > - Mark > > http://www.lucidimagination.com (mobile) > > On Sep 21, 2009, at 2:08 PM, "Michael McCandless (JIRA)" > wrote: > >> >>   [ >> https://issues.apache.org/jira/browse/LUCENE-1781?page=com.at

[jira] Resolved: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1781. Resolution: Fixed Fix Version/s: (was: 3.1) 2.9 Than

Re: TermCount per fiend

2009-09-21 Thread John Wang
Thanks Michael! Makes lotta sense to me to wait for LUCENE-1458 then. Should I create an issue with a depedency on 1458? One application for this is within FieldCache construction of StringIndex: If we know the number of terms is small, the orderArray using an int per doc is wasteful. In the cas

Re: TermCount per fiend

2009-09-21 Thread Michael McCandless
On Mon, Sep 21, 2009 at 8:11 PM, John Wang wrote: > Makes lotta sense to me to wait for LUCENE-1458 then. Should I create an > issue with a depedency on 1458? Yes please open a new issue. > One application for this is within FieldCache construction of StringIndex: > > If we know the number of t

[jira] Created: (LUCENE-1922) exposing the ability to get the number of unique term count per field

2009-09-21 Thread John Wang (JIRA)
exposing the ability to get the number of unique term count per field - Key: LUCENE-1922 URL: https://issues.apache.org/jira/browse/LUCENE-1922 Project: Lucene - Java Issue

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread Ted Dunning
John, I think that inherent in your test is a uniform distribution of updates. This seems unrealistic to me, not least because any distribution of updates caused by a population of objects interacting with each other should be translation invariant in time which is something a uniform distributio

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread Jason Rutherglen
I'm not sure I communicated the idea properly. If CMS is set to 1 thread, no matter how intensive the CPU for a merge, it's limited to 1 core of what is in many cases a 4 or 8 core server. That leaves the other 3 or 7 cores for queries, which if slow, indicates that it isn't the merging that's slow

Re: Welcome, Koji

2009-09-21 Thread Robert Muir
welcome! On Mon, Sep 21, 2009 at 8:06 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > A warm welcome to our newest Lucene contrib committer, Koji Sekiguchi! > > Koji has given us the FastVectorHighlighter and CharFilter, among > other fun things. He's also a committer in Solr. > > W

Re: Welcome, Koji

2009-09-21 Thread Koji Sekiguchi
Hello everyone, I'm happy to be a new member of the contrib committers of Lucene. I hope I can help to improve Lucene in 3.0 and the future. Currently, I carry on my own company, RONDHUIT, based on Tokyo. In the company, we provide Lucene/Solr consulting and support services for our customers. T

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
Hi Ted: In our case it is profile updates. Each profile -> 1 document keyed on member id. We do experience people updating their profile and the assumption is every member is likely to update their profile (that is a bit aggressive I'd agree, but it is nevertheless a safe upper bound)

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
Jason: You are missing the point. The idea is to avoid merging of large segments. The point of this MergePolicy is to balance segment merges across the index. The aim is not to have 1 large segment, it is to have n segments with balanced sizes. When the large segment is out of the IO

Re: Welcome, Koji

2009-09-21 Thread Mark Miller
Welcome aboard Koji! - Mark Koji Sekiguchi wrote: > Hello everyone, > > I'm happy to be a new member of the contrib committers of Lucene. > I hope I can help to improve Lucene in 3.0 and the future. > > Currently, I carry on my own company, RONDHUIT, based on Tokyo. > In the company, we provide L

[jira] Updated: (LUCENE-995) Add open ended range query syntax to QueryParser

2009-09-21 Thread Adriano Crestani (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-995: Attachment: LUCENE-995_09_21_2009.patch The patch adds open ended range query to oal.queryP

Re: ReleaseTodo steps

2009-09-21 Thread Chris Hostetter
: md5sum generates a hash line like this: : a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz : : Then when you do a check, it knows what file to check against. : : The Maven artifacts just list the hash though. So it seems proper to : remove the second part and just put the hash? Some back

Re: Welcome, Koji

2009-09-21 Thread Shalin Shekhar Mangar
On Tue, Sep 22, 2009 at 5:36 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > A warm welcome to our newest Lucene contrib committer, Koji Sekiguchi! > > Koji has given us the FastVectorHighlighter and CharFilter, among > other fun things. He's also a committer in Solr. > > Welcome abo

Build failed in Hudson: Lucene-trunk #955

2009-09-21 Thread Apache Hudson Server
See -- [...truncated 15617 lines...] [junit] [junit] Testsuite: org.apache.lucene.queryParser.TestMultiAnalyzer [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.228 sec

2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
Looking at the code, seems there is a disconnect between how/when field cache is loaded when IndexWriter.getReader() is called. Is FieldCache updated? Otherwise, are we reloading FieldCache for each reader instance? Seems for operations that lazy loads field cache, e.g. sorting, this has a signif

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread Yonik Seeley
On Tue, Sep 22, 2009 at 12:56 AM, John Wang wrote: > Looking at the code, seems there is a disconnect between how/when field > cache is loaded when IndexWriter.getReader() is called. I'm not sure what you mean by "disconnect" > Is FieldCache updated? FieldCache entries are populated on demand,

Re: Build failed in Hudson: Lucene-trunk #955

2009-09-21 Thread Yonik Seeley
On Tue, Sep 22, 2009 at 12:44 AM, Apache Hudson Server wrote: > BUILD FAILED > :142: > The following error occurred while executing this line: >

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
Hi Yonik: Actually that is what I am looking for. Can you please point me to where/how sorting is done per-segment? When heaving indexing introduces or modifies segments, would it cause reloading of FieldCache at query time and thus would impact search performance? thanks -John On Tu

RE: [jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Uwe Schindler
I thought, we are already in the voting phase? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Tuesday, September 22, 2009 1:52 AM > To: java-dev@lucene.

RE: Welcome, Koji

2009-09-21 Thread Uwe Schindler
Welcome Koji! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] > Sent: Tuesday, September 22, 2009 3:17 AM > To: java-dev@lucene.apache.org > Subject: Re: Welcom

Re: Welcome, Koji

2009-09-21 Thread John Wang
Congratulations Koji! -John On Tue, Sep 22, 2009 at 1:47 PM, Uwe Schindler wrote: > Welcome Koji! > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Koji Sekiguchi [mailto:k...@r.email.