Re: svn commit: r901662 - in /lucene/java/trunk: contrib/analyzers/common/src/java/org/tartarus/snowball/ contrib/snowball/ contrib/snowball/src/java/org/tartarus/snowball/ src/java/org/apache/lucen

2010-01-21 Thread Michael McCandless
Duh, I didn't even notice we also had an existing ArrayUtilTest yes please post a patch! Mike On Thu, Jan 21, 2010 at 8:59 AM, Uwe Schindler wrote: > Somehow we have now both ArrayUtilTest and TestArrayUtil? I think they should > be merged and TestArrayUtil as name preferred (as all other t

Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

2010-01-28 Thread Michael McCandless
On Thu, Jan 28, 2010 at 5:20 AM, Uwe Schindler wrote: > Can we fix NIOFSIndexInput to simply reopen the channel when the exception > occurs? The problem is that the file may have been deleted in the meantime. This is quite a nasty behavior of FileChannel. Mike ---

Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

2010-01-28 Thread Michael McCandless
On Thu, Jan 28, 2010 at 5:59 AM, Uwe Schindler wrote: > But if you keep the underlying RandomAccessFile open? We could do that but... won't this consume 2 file descriptors? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lu

Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

2010-01-28 Thread Michael McCandless
>> >> Possibly we could wait until Simon provides a testcase that fails. >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >> > -Original Message- &

Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

2010-01-28 Thread Michael McCandless
On Thu, Jan 28, 2010 at 6:38 AM, Uwe Schindler wrote: > So I checked the code of NIOFSIndexInput, my last comment was not really > correct: > NIOFSIndexInput extends SimpleFSIndexInput and that opens the RAF. In the > ctor RAF.getChannel() is called. The RAF keeps open until the file is closed

Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

2010-01-28 Thread Michael McCandless
. Or, 3) don't use NIOFSDir! Mike On Thu, Jan 28, 2010 at 7:29 AM, Simon Willnauer wrote: > On Thu, Jan 28, 2010 at 12:43 PM, Michael McCandless > wrote: >> On Thu, Jan 28, 2010 at 6:38 AM, Uwe Schindler wrote: >> >>> So I checked the code of NIOFSIndex

Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

2010-01-29 Thread Michael McCandless
nd people could then report Lucene > 3.X has slowed... > > On Thu, Jan 28, 2010 at 5:24 AM, Michael McCandless > wrote: >> Bummer. >> >> So the only viable workarounds are 1) don't use Thread.interrupt (nor, >> things like Future.cancel, which in turn us

Re: Release Lucene Java 2.9.2 & 3.0.(1|2) together soon

2010-02-07 Thread Michael McCandless
+1 to release. Thank you for volunteering :) We've got a number of good bug fixes pending... But: I think we should simply name it 3.0.1? If we skip 3.0.1 I think it will cause confusion? We can state in the CHANGES that 2.9.2 has same bug fixes as 3.0.1 and vice/versa? Mike On Sun, Feb 7, 2

Re: Build failed in Hudson: Lucene-trunk #1088

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 9:31 AM, Uwe Schindler wrote: > The TestSpellChecker Executor problem seems to be a sun bug fixed in JDK > 1.5.0_17 (awaitTermination problem: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6576792 and related bugs). > We updated lucene-zones's JVM for builds to the

Re: Lucene Flex Branch

2010-02-10 Thread Michael McCandless
:) Also LUCENE-2111 is where flex development is "continuing"... LUCENE-1458 got too big (loading the page was slow). Mike On Wed, Feb 10, 2010 at 11:17 AM, Robert Muir wrote: > Hello, I think of it as "Mike Mccandless's Emacs workspace" (joking) > > it is a branch located here: > https://svn.a

Re: Build failed in Hudson: Lucene-trunk #1091

2010-02-11 Thread Michael McCandless
This is LUCENE-2118 -- it strikes every so often. I think it's harmless (but really annoying). It looks like a corner case in the merge policy where somehow a level is able to have 11 segments after merging thinks it's done. Mike On Thu, Feb 11, 2010 at 3:03 AM, Uwe Schindler wrote: > Really c

Re: svn commit: r909357 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/index/ src/test/org/apache/lucene/index/

2010-02-12 Thread Michael McCandless
Woops, right -- I just fixed. Thanks for catching it :) Mike On Fri, Feb 12, 2010 at 6:55 AM, Koji Sekiguchi wrote: > Mike, > > You said "removeUnusedFiles" in CHANGES.txt, but isn't it > "deleteUnusedFiles"? > > Koji > > -- > http://www.rondhuit.com/en/ > > > mikemcc...@apache.org wrote: >> >>

Re: (LUCENE-1844) Speed up junit tests

2010-02-14 Thread Michael McCandless
ert Muir wrote: > On Fri, Nov 27, 2009 at 1:27 PM, Michael McCandless > wrote: >> >> Also one thing I'd love to try is NOT forking the JVM for each test >> (fork="no" in the junit task).  I wonder how much time that'd buy... >> > > it shaves

Re: IndexWriter.init() checks for infoStream != null, redundantly?

2010-02-14 Thread Michael McCandless
IndexWriter has a default infoStream, so the infoStream could be non-null during init. Mike On Sat, Feb 13, 2010 at 3:16 PM, Shai Erera wrote: > Hi > > IndexWriter.init() checks a couple of times whether infoStream != null in > order to print informative messages ... init() is called only from t

Re: (LUCENE-1844) Speed up junit tests

2010-02-20 Thread Michael McCandless
nally! Are there any possibilities inside Eclipse/other-IDEs >> to check this? >> >> Uwe >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> > -Origina

Re: IndexFileNames

2010-02-23 Thread Michael McCandless
This class makes me somewhat nervous, with the changes coming in flex, because the extensions are no longer static but rather a function of the particular codec you're using in the index. I've changed some of the constants accordingly (on flex). Still, I think it's OK to make it public (flex has

Re: IndexFileNames

2010-02-23 Thread Michael McCandless
ode, I didn't find in places I checked that a > reference to *just* the extension name is needed. > > And thanks for correcting me on the package-private and back-compat thing. > In my mind it was already public :). > > Shai > > On Tue, Feb 23, 2010 at 1:16 PM, Michael Mc

Re: Looks like we missed a little change for 3.0 ...

2010-02-23 Thread Michael McCandless
Sigh... yes, better to turn these into Jira issues in general. We could make the change under Version? (Change to true, starting in 3.1). Or maybe not make the change. If set to true, we use pct deletion on a segment to reduce its perceived size when selecting merges, which generally causes seg

Re: IndexFileNames

2010-02-23 Thread Michael McCandless
On Tue, Feb 23, 2010 at 6:46 AM, Shai Erera wrote: > I don't think performance is the issue here, but rather correctness. Someone > cannot just ask filename.endsWith(DELETION_EXT) as files like "file1del" > would match as well. So whenever you make such check, you need to add ".". > Again, not per

Re: Add IndexWriter.doBeforeFlush()

2010-02-23 Thread Michael McCandless
+1 to both adding doAfterFlush and making the two methods protected. Patch? Mike On Tue, Feb 23, 2010 at 6:55 AM, Shai Erera wrote: > Hi, > > Can we add to IW a doBeforeFlush(), similar to doAfterFlush(), which will > get called before flush actually happens (i.e., at the beginning of > flush()

Re: [VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts - Take #2

2010-02-23 Thread Michael McCandless
+1 to release. I used each version's binary release to build & search a 5M wikipedia index. Search performance is the same for TermQuery with both releases, but for PhraseQuery (at least the 3 simple 2-word phrases I tested) was ~9% faster (20.49 QPS -> 22.29 QPS). Not sure why... but it's movin

Re: MatchAllDocsQueryNode toString() creates invalid XML-Tag

2010-02-24 Thread Michael McCandless
This sounds like a bug -- can you open an issue? Thanks! Mike On Wed, Feb 24, 2010 at 10:04 AM, Frank Wesemann wrote: > Hi, > I am just getting my feet wet with the queryParser in contrib/queryparser. > This new API is really a huge improvement. > I am using it to convert Solr-Style input into

Re: Field level document update

2010-02-25 Thread Michael McCandless
Possible approaches have been discussed on the list, fairly recently, but I don't think there's active work against it... Mike On Thu, Feb 25, 2010 at 5:24 AM, Anshum wrote: > Hi, > I'd like to know do we have something for field level document updation > planned for the near future? Something t

Baby steps towards making Lucene's scoring more flexible...

2010-02-26 Thread Michael McCandless
In thinking about & discussing with Robert how to allow Lucene to support other scoring models, eg lnu.ltc, BM25, etc I think a relatively contained set of changes can give us a solid step forward. Something like this: * Store additional per-doc stats in the index, eg in a custom posting

Re: SegmentInfos extends Vector

2010-02-28 Thread Michael McCandless
This class is @lucene.experimental, so we are free to break it. +1 to not "extends Vector". I don't think we should change to @lucene.internal since the thinking is apps outside Lucene should be able to introspect and see segment structure in the index. Ie we made this API public so people o

Re: Turning IndexReader.isDeleted implementations to final

2010-02-28 Thread Michael McCandless
Seems OK I think? Mike On Sun, Feb 28, 2010 at 12:37 AM, Shai Erera wrote: > Hi > > Do you think it's worth to make some of the isDeleted method impls final, > like in ReadOnlySegmentReader and (maybe) DirectoryReader? I'm thinking the > classes that are perceived as final could benefit from tha

Re: Turning IndexReader.isDeleted implementations to final

2010-02-28 Thread Michael McCandless
wrote: > What's ok? making the classes final or just the method declaration? If > classes, besides ReadOnlySegmentReader, which other impls do you think can > be made final (I'm not in front of the code)? > > On Sun, Feb 28, 2010 at 7:05 PM, Michael McCandless > wrote: >&g

Re: Turning IndexReader.isDeleted implementations to final

2010-03-01 Thread Michael McCandless
>> > e.g. the collect methods in TFDC should be final and so on. But there is no >> > requirement anymore. And Lucene 3.1 only runs with Java 5+, so who cares? >> > >> > - >> > Uwe Schindler >> > H.-H.-Meier-Allee 63, D-28213 Bremen >> >

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-02 Thread Michael McCandless
On Sun, Feb 28, 2010 at 1:38 PM, Marvin Humphrey wrote: > On Fri, Feb 26, 2010 at 12:50:44PM -0500, Michael McCandless wrote: > >> * Store additional per-doc stats in the index, eg in a custom >> posting list, > > Inline, as in a payload? Of course that can wo

Re: Turning IndexReader.isDeleted implementations to final

2010-03-03 Thread Michael McCandless
wrote: > In the analyzers case, I don't think its really door-shutting. if someone > extends an Analyzer, its likely to just result in problems from the > tokenStream/reusableTokenStream mess. > > On Wed, Mar 3, 2010 at 11:10 AM, Grant Ingersoll > wrote: >> >

Re: Turning IndexReader.isDeleted implementations to final

2010-03-03 Thread Michael McCandless
On Wed, Mar 3, 2010 at 11:10 AM, Grant Ingersoll wrote: > > On Mar 1, 2010, at 2:51 AM, Michael McCandless wrote: > >> Yeah in the case of DirectoryReader/MultiReader, I'd like for them to >> be final, not for performance but for door-shutting (ie the same >>

Fwd: (SOLR-355) Parsing mixed inclusive/exclusive range queries

2010-03-04 Thread Michael McCandless
If Solr/Lucene dev were merged, and queryParser is it's own module, this user could simply upgrade his queryParser JAR to get this fix. Mike -- Forwarded message -- From: Alexander S (JIRA) Date: Thu, Mar 4, 2010 at 2:24 AM Subject: (SOLR-355) Parsing mixed inclusive/exclusive r

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-04 Thread Michael McCandless
On Tue, Mar 2, 2010 at 4:12 PM, Marvin Humphrey wrote: > On Tue, Mar 02, 2010 at 05:55:44AM -0500, Michael McCandless wrote: >> The problem is, these scoring models need the avg field length (in >> tokens) across the entire index, to compute the norms. >> >> Ie, you

Re: IndexWriter.applyDeletes performance

2010-03-05 Thread Michael McCandless
Currently you can't tell IW to use the pool (ie, pool is only enabled if you use NRT readers). We should probably make this an option at ctor time, for situations like this. (In fact, in followon discussions about further improvements to NRT we've already discussed having such an option to IW's c

Re: IndexWriter.applyDeletes performance

2010-03-05 Thread Michael McCandless
OK I opened: https://issues.apache.org/jira/browse/LUCENE-2297 Mike On Fri, Mar 5, 2010 at 10:25 AM, Michael McCandless wrote: > Currently you can't tell IW to use the pool (ie, pool is only enabled > if you use NRT readers).  We should probably make this an option at >

Re: (LUCENE-2294) Create IndexWriterConfiguration and store all of IW configuration there

2010-03-05 Thread Michael McCandless
On Fri, Mar 5, 2010 at 3:56 PM, Mark Miller wrote: > On 03/05/2010 03:43 PM, Michael McCandless (JIRA) wrote: >>      [ >> https://issues.apache.org/jira/browse/LUCENE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-06 Thread Michael McCandless
On Fri, Mar 5, 2010 at 1:54 PM, Marvin Humphrey wrote: > On Thu, Mar 04, 2010 at 12:23:38PM -0500, Michael McCandless wrote: >> > In a multi-node search cluster, pre-calculating norms at index-time >> > wouldn't work well without additional communication between nodes

Re: svn commit: r920240 - in /lucene/java/branches/flex_1458: ./ contrib/ contrib/analyzers/common/src/java/org/tartarus/snowball/ contrib/highlighter/src/test/ contrib/instantiated/src/test/org/apa

2010-03-08 Thread Michael McCandless
On Mon, Mar 8, 2010 at 4:17 AM, wrote: > Author: uschindler > Date: Mon Mar  8 09:17:03 2010 > New Revision: 920240 > > URL: http://svn.apache.org/viewvc?rev=920240&view=rev > Log: > Merge flex up to trunk rev 920237. > > This revision was left out, because it conflicted "heavy": 919060 > Message

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Michael McCandless
On Sun, Mar 7, 2010 at 1:21 PM, Marvin Humphrey wrote: > On Sat, Mar 06, 2010 at 05:07:18AM -0500, Michael McCandless wrote: >> It won't encounter an unknown posting format. It's the codec. It >> knows all posting formats by the time it sees it. > > OK, so you&

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Michael McCandless
On Mon, Mar 8, 2010 at 2:07 PM, Steven A Rowe wrote: > On 03/08/2010 at 1:57 PM, Steven A Rowe wrote: >> On 03/08/2010 at 1:13 PM, Michael McCandless wrote: >> > On Sun, Mar 7, 2010 at 1:21 PM, Marvin Humphrey >> > wrote: >> > > On Sat, Mar 06, 2010 at 05:

Re: Multi-node stats within individual nodes (was "Baby steps...")

2010-03-08 Thread Michael McCandless
On Sun, Mar 7, 2010 at 11:43 AM, Marvin Humphrey wrote: > On Sat, Mar 06, 2010 at 05:07:18AM -0500, Michael McCandless wrote: >> > Fortunately, beaming field length data around is an easier problem than >> > distributed IDF, because with rare exceptions, the number of fie

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-09 Thread Michael McCandless
On Mon, Mar 8, 2010 at 9:47 PM, Marvin Humphrey wrote: > On Mon, Mar 08, 2010 at 01:13:53PM -0500, Michael McCandless wrote: >> I think we can actually do so w/o losing Lucene's loose typing if we >> simply peeled out [say] a FieldType class that holds the settings you >

Re: Multi-node stats within individual nodes (was "Baby steps...")

2010-03-09 Thread Michael McCandless
On Tue, Mar 9, 2010 at 2:28 AM, Marvin Humphrey wrote: > On Mon, Mar 08, 2010 at 02:23:47PM -0500, Michael McCandless wrote: >> For a large index the stats will be stable after re-indexing only a >> few more docs. > > Well, not if there's been huge churn on other no

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-09 Thread Michael McCandless
On Tue, Mar 9, 2010 at 10:03 AM, Marvin Humphrey wrote: > On Tue, Mar 09, 2010 at 05:06:08AM -0500, Michael McCandless wrote: >> > For what it's worth, that's sort of the way KS used to work: >> > Schema/FieldType >> > information was stored entirely in

Re: Multi-node stats within individual nodes (was "Baby steps...")

2010-03-09 Thread Michael McCandless
On Tue, Mar 9, 2010 at 2:11 PM, Marvin Humphrey wrote: >> > I don't know that compressing the raw materials is going to work as well as >> > compressing the final product.  Early quantization errors get compounded >> > when >> > used in later calculations. >> >> I would not compress for starters.

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-11 Thread Michael McCandless
On Tue, Mar 9, 2010 at 3:58 PM, Marvin Humphrey wrote: > On Tue, Mar 09, 2010 at 01:18:12PM -0500, Michael McCandless wrote: >> >> >> You said "of course" before but... how in your proposal could one >> >> store all stats for a given field during indexi

Re: Welcome Chris Male as Contrib committer!

2010-03-12 Thread Michael McCandless
Welcome aboard Chris! Mike On Fri, Mar 12, 2010 at 9:17 AM, Mark Miller wrote: > I am happy to announce the Lucene PMC has accepted Chris Male as a > contrib committer! > > Chris has been making a lot of headway in cleaning up the spacial contrib > lately, > and hopefully now we can get more of

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-13 Thread Michael McCandless
On Thu, Mar 11, 2010 at 12:35 PM, Marvin Humphrey wrote: > On Mon, Mar 08, 2010 at 02:10:35PM -0500, Michael McCandless wrote: > >> We ask it to give us a Codec. > > There's a conflict between the segment-wide role of the "Codec" class and its > role as speci

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-13 Thread Michael McCandless
On Fri, Mar 12, 2010 at 8:31 PM, Marvin Humphrey wrote: > On Thu, Mar 11, 2010 at 05:59:03AM -0500, Michael McCandless wrote: >> > So there would be polymorphism in the decoding phase while we're supplying >> > information the Similarity object needs to make its similari

Re: Different behavior of Directory.fieldLength()

2010-03-13 Thread Michael McCandless
I like the proposed new semantics (throw FNFE if the file does not exist), and the migration path (new method, deprecate old). Mike On Sat, Mar 13, 2010 at 7:46 AM, Shai Erera wrote: > I think it falls under the semantics of dir.fileLength() and not the > semantics of the implementation right? U

Re: Different behavior of Directory.fieldLength()

2010-03-13 Thread Michael McCandless
Thanks! Mike On Sat, Mar 13, 2010 at 9:10 AM, Shai Erera wrote: > Ok, opened LUCENE-2316 to track this. > > Shai > > On Sat, Mar 13, 2010 at 3:49 PM, Michael McCandless > wrote: >> >> I like the proposed new semantics (throw FNFE if the file does not >>

Re: [DISCUSS] Do away with Contrib Committers and make core committers

2010-03-14 Thread Michael McCandless
+1 Mike On Sun, Mar 14, 2010 at 11:53 AM, Grant Ingersoll wrote: > Given the notion of "one project, one set of committers", I think we should > do away with the notion of contrib committers for java-dev and just have > everyone be committers.  Practically speaking, this would make all existin

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-15 Thread Michael McCandless
On Mon, Mar 15, 2010 at 12:03 AM, Marvin Humphrey wrote: > On Sat, Mar 13, 2010 at 06:41:26AM -0500, Michael McCandless wrote: > >> I still don't think similarity should have any bearing during indexing. > > Similarity has always, from day one, affected the contents of

Welcome new committers!

2010-03-15 Thread Michael McCandless
The merge of Solr and Lucene dev is well underway... Lucene already has a bunch of new committers... welcome aboard! And overnight tons of work was done (and beer, espresso and tea, depending on your timezone, consumed ;) and now we already have a branch where Solr has been upgraded to Lucene's tr

Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch wrote: > On 3/16/10 12:43 AM, Simon Willnauer wrote: >> >> If my impression should be wrong or if I miss something please ignore >> the last paragraph. > > I feel exactly like you, Simon. I don't understand the rush. Also, we're > in review-and-comm

Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
I think it like the 1st option best (lucene moves as subdir to solr's current trunk SVN path), but I don't feel strongly. This'd mean one could simply checkout lucene alone and do everything you can do today. But if you check out solr, you also get a full checkout of lucene, and solr's build.xml

Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Michael McCandless
+1, this looks great! Mike On Tue, Mar 16, 2010 at 1:52 PM, Andi Vajda wrote: > > On Mar 16, 2010, at 11:47, Steven A Rowe wrote: > >> On 03/16/2010 at 6:06 AM, Michael McCandless wrote: >>> >>> Does anyone know how other projects fold in IRC...? >> >

Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Michael McCandless
On Tue, Mar 16, 2010 at 2:17 PM, Michael Busch wrote: > But at the same time can we make sure that the decisions that are made on > IRC are still being described in a jira issue? +1 Any time something is discussed on IRC, it must be summarized on the lists or in an issue, with the details based

Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
The primary concern seems to be ensuring that, once we merge svn, one can still checkout & build & run tests/etc for Lucene alone. If we move lucene under Solr's existing svn path, ie: /solr/trunk/lucene and then fixup solr's build files to go and compile sources from the lucene dir, run tests

Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
Dev is now merged with Solr and Lucene -- that has already passed. If that will scare customers away, that's a risk we take -- the benefits of merged dev outweigh that, in my opinion. The incremental risk that the details of our svn URLs will scare people away seems negligible. And we can always

Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
But it's actually the reverse? Solr depends on Lucene but not vice/versa. (If instead I proposed making Solr a subdir of Lucene then I'd agree) So... if you checkout only lucene, you can cd there and do all you do today with Lucene ("ant test", "ant dist", "svn diff", etc.). If you checkout

Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
+1 I like this proposal! I agree we should not preclude the future (modules), let's just not hold up dev today until we solve it. I agree your side by side solution would allow for us to later factor up modules (eg analyzers). Mike On Tue, Mar 16, 2010 at 5:47 PM, Michael McCandless

Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
Duh -- I meant to reply to Hoss' proposal, below: On Tue, Mar 16, 2010 at 5:55 PM, Michael McCandless wrote: > +1 > > I like this proposal! > > I agree we should not preclude the future (modules), let's just not > hold up dev today until we solve it. > > I agre

Re: IndexWriter.synced field accumulates data

2010-03-17 Thread Michael McCandless
You're right! Really we should delete from sync'd when we delete the files. We need to tie into IndexFileDeleter for that, maybe moving this set into there. Though in practice the amount of actual RAM used should rarely be an issue? But we should fix it... Can you open an issue? Mike On Wed,

Re: IndexWriter.synced field accumulates data

2010-03-18 Thread Michael McCandless
Thanks! Mike On Wed, Mar 17, 2010 at 3:16 PM, Gregor Kaczor wrote: > followup in > > https://issues.apache.org/jira/browse/LUCENE-2328 > > > Original-Nachricht >> Datum: Wed, 17 Mar 2010 14:30:25 -0500 >> Von: Michael McCandless >> An: j

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-18 Thread Michael McCandless
On Mon, Mar 15, 2010 at 7:49 PM, Marvin Humphrey wrote: > On Mon, Mar 15, 2010 at 05:28:33AM -0500, Michael McCandless wrote: >> I mean specifically one should not have to commit to the precise >> scoring model they will use for a given field, when they index that >> field.

Re: How can I use QueryScorer() to find only perfect matches??

2010-03-18 Thread Michael McCandless
Unfortunately, highlighter (and I think also fast vector highlighter) are able to return a set of fragments which do not match the query (eg, they only show one of the two required terms). I really don't like that they do this. Ideally (to me) the entire excerpt (ie, all fragments appended togeth

Re: lucene and solr trunk

2010-03-18 Thread Michael McCandless
All tests pass for me :) Mike On Thu, Mar 18, 2010 at 12:27 PM, Mark Miller wrote: > Alight, so we have implemented Hoss' suggestion here on the lucene/solr > merged dev branch at lucene/solr/branches/newtrunk. > > Feel free to check it out and give some feedback. > > We also roughly have Solr r

Re: Sorting with little memory: A suggestion

2010-03-19 Thread Michael McCandless
If you build the ords per-segment, how do you compare results across segments? Ie, in the non-Collator case, Lucene stores ords but must also store the actual String so that the FieldComparator is able to compare results across segments Mike On Fri, Mar 19, 2010 at 10:06 AM, Toke Eskildsen wro

Re: Sorting with little memory: A suggestion

2010-03-19 Thread Michael McCandless
On Fri, Mar 19, 2010 at 12:46 PM, Toke Eskildsen wrote: > However, it is not set in stone that we will shift to using Exposed or > similar: As many others we're pursuing real-time indexing and while Exposed > sits at the segment-level and thus works well for re-open, big segment > changes sti

Re: (LUCENE-2297) IndexWriter should let you optionally enable reader pooling

2010-03-22 Thread Michael McCandless
aphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Michael McCandless (JIRA) [mailto:j...@apache.org] >> Sent: Monday, March 22, 2010 11:22 AM >> To: java-dev@lucene.apache.org >> Subject: [jira] Resolved: (LUCENE-2297) IndexWriter sho

Re: (LUCENE-2297) IndexWriter should let you optionally enable reader pooling

2010-03-22 Thread Michael McCandless
t you were > suggesting? > Cheers > Chris > > On Mon, Mar 22, 2010 at 11:37 AM, Michael McCandless > wrote: >> >> I think we should. >> >> It (newtrunk) was created to test Hoss's side-by-sdie proposal, and >> that approach looks to be working very we

Re: Mailing List merge

2010-03-22 Thread Michael McCandless
+1, let's do this now. Mike On Mon, Mar 22, 2010 at 11:44 AM, Grant Ingersoll wrote: > Shall we merge the dev mailing lists?  This should reduce the cross-posting > and can be completely automated (other than you may have to update your > client-side filters) and was part of the plan to merge

Re: Mailing List merge

2010-03-22 Thread Michael McCandless
+1 Mike On Mon, Mar 22, 2010 at 11:53 AM, Ryan McKinley wrote: > why not just "d...@lucene.apache.org"? > > > > On Mon, Mar 22, 2010 at 11:44 AM, Grant Ingersoll wrote: >> Shall we merge the dev mailing lists?  This should reduce the cross-posting >> and can be completely automated (other than

Re: Set IDF value manually on a search query

2010-03-22 Thread Michael McCandless
You can create your own Similarity implementation? Mike On Mon, Mar 22, 2010 at 12:32 PM, zsl wrote: > > Hi all! > > Im developing an aplication that uses Lucene and I´m trying to set the IDF > manually before I do query. > In other words ¿Is there a way to do a search query with an IDF value >

Re: Implementing new collectors

2010-03-23 Thread Michael McCandless
You can implement just the "out of order" collector, since it subsumes the in-order case, and all will work fine. However, if the collector can save CPU when docs are known to arrive in-order (not all collectors can) it'd be good to make a separate in-order one as well. Mike On Tue, Mar 23, 2010

Re: Implementing new collectors

2010-03-23 Thread Michael McCandless
OK put it up! Sounds good :) Mike On Tue, Mar 23, 2010 at 1:54 PM, Grant Ingersoll wrote: > > On Mar 23, 2010, at 1:20 PM, Michael McCandless wrote: > >> You can implement just the "out of order" collector, since it subsumes >> the in-order case, and all will w

Re: (LUCENE-2344) PostingsConsumer#merge does not call finishDoc

2010-03-24 Thread Michael McCandless
Ahh, very nice! Mike On Wed, Mar 24, 2010 at 11:39 AM, Michael McCandless (JIRA) wrote: > >    [ > https://issues.apache.org/jira/browse/LUCENE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849228#action_12849228 > ] > &

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-25 Thread Michael McCandless
On Mon, Mar 22, 2010 at 12:45 PM, Marvin Humphrey wrote: > On Thu, Mar 18, 2010 at 05:16:23AM -0500, Michael McCandless wrote: >> Also, will Lucy store the original stats? > > These? > > * Total number of tokens in the field. > * Number of unique terms in th

Welcome Shai Erera as Lucene/Solr committer

2010-03-26 Thread Michael McCandless
I'm happy to announce that the PMC has accepted Shai Erera as Lucene/Solr committer! Welcome aboard Shai, Mike PS: it's custom to introduce yourself with a brief bio :) - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apac

Re: Modules

2010-03-26 Thread Michael McCandless
I think we should consolidate all query parsers as a module? And all queries (contrib/queries + oal/search/*Query)? I don't think we should leave "basic X" inside core... I think there should be one place to get the different Xs lucene offers (where X is a query parser, queries, analyzers, etc.).

Re: svn commit: r928246 [1/6] - in /lucene/java/branches/flex_1458: ./ backwards/src/ backwards/src/java/org/apache/lucene/search/ backwards/src/test/org/apache/lucene/analysis/ backwards/src/test/o

2010-03-27 Thread Michael McCandless
I'm merging the conflicts now... it turned out to cause a number of conflicts because in flex I changed how DW stores the terms in RAM, to first prefix-code each term's length as vInt (most often 1 byte, but in messed up cases 2 bytes), and then then the term's characters as UTF8 bytes. This cause

Re: svn commit: r928246 [1/6] - in /lucene/java/branches/flex_1458: ./ backwards/src/ backwards/src/java/org/apache/lucene/search/ backwards/src/test/org/apache/lucene/analysis/ backwards/src/test/o

2010-03-27 Thread Michael McCandless
Right... in fact as long as we land flex before 3.1 releases then this is not a back-compat break (but we should heavily advertise the change in semantics) ;) Ie Directory.copy used to filter for only index files, but Directory.copyTo copies everything so you must provide your own list if this mat

Re: svn commit: r928246 [1/6] - in /lucene/java/branches/flex_1458: ./ backwards/src/ backwards/src/java/org/apache/lucene/search/ backwards/src/test/org/apache/lucene/analysis/ backwards/src/test/o

2010-03-27 Thread Michael McCandless
Ugh, actually, it is still a back-compat break :( Because Directory.copy just forwards to copyTo. I'll advertise in CHANGES for flex. Mike On Sat, Mar 27, 2010 at 4:41 PM, Michael McCandless wrote: > Right... in fact as long as we land flex before 3.1 releases then this > is not a

Re: svn commit: r928246 [1/6] - in /lucene/java/branches/flex_1458: ./ backwards/src/ backwards/src/java/org/apache/lucene/search/ backwards/src/test/org/apache/lucene/analysis/ backwards/src/test/o

2010-03-27 Thread Michael McCandless
ll advertise in CHANGES for flex. >> >> Mike >> >> On Sat, Mar 27, 2010 at 4:41 PM, Michael McCandless >> wrote: >>> Right... in fact as long as we land flex before 3.1 releases then this >>> is not a back-compat break (but we should heavily adv

Re: Incremental Field Updates

2010-03-29 Thread Michael McCandless
I agree this is a long overdue feature... we need to get it into Lucene somehow. I like the Layers analogy... I think that will work well with Lucene's transactional semantics, ie a prior commit point would continue to see the index before the updates but new commit points would see the updates.

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-29 Thread Michael McCandless
On Thu, Mar 25, 2010 at 1:20 PM, Marvin Humphrey wrote: > On Thu, Mar 25, 2010 at 06:24:34AM -0400, Michael McCandless wrote: >> >> Also, will Lucy store the original stats? >> > >> > These? >> > >> > * Total number of tokens in the

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-29 Thread Michael McCandless
I think that's a good idea for Lucy. Mike On Fri, Mar 26, 2010 at 10:58 AM, Marvin Humphrey wrote: > On Thu, Mar 25, 2010 at 06:24:34AM -0400, Michael McCandless wrote: >> > Maybe aggressive automatic data-reduction makes more sense in the context >> > of >>

Landing the flex branch

2010-03-30 Thread Michael McCandless
I think the time has finally come! Pending one issue (LUCENE-2354 -- Uwe), I think flex is ready to land I think the other issues with Fix Version = Flex Branch can be moved to 3.1 after we land. We still use the pre-flex APIs in a number of places... I think this is actually good (so we cont

Re: Welcome Uwe Schindler to the Lucene PMC

2010-04-01 Thread Michael McCandless
Welcome Uwe!! Mike On Thu, Apr 1, 2010 at 7:05 AM, Grant Ingersoll wrote: > I'm pleased to announce that the Lucene PMC has voted to add Uwe Schindler to > the PMC.  Uwe has been doing a lot of work in Lucene and Solr, including > several of the last releases in Lucene. > > Please join me in e

Re: Incremental Field Updates

2010-04-03 Thread Michael McCandless
On Sat, Apr 3, 2010 at 1:25 AM, Babak Farhang wrote: >> I think they get merged in by the merger, ideally in the background. > > That sounds sensible. (In other words, we wont concern ourselves with > roll backs--something possible while a "layer" is still around.) Actually roll backs would still

Re: Term space continuity

2010-04-05 Thread Michael McCandless
The flex API isolates fields, ie you get a TermsEnum for a given field and it enums only the term's text (as a BytesRef). Mike On Mon, Apr 5, 2010 at 7:22 PM, Earwin Burrfoot wrote: > A random thought from some of the earlier discussions. > > Had anybody used the fact that Lucene Term space is c

Re: Incremental Field Updates

2010-04-06 Thread Michael McCandless
have to be ordered if we > introduce updates?  Or does the onus of maintaining order fall on the > application? > > -Babak > > On Sat, Apr 3, 2010 at 3:28 AM, Michael McCandless > wrote: >> On Sat, Apr 3, 2010 at 1:25 AM, Babak Farhang wrote: >>>> I think the

Re: Getting fsync out of the loop

2010-04-06 Thread Michael McCandless
On Tue, Apr 6, 2010 at 10:11 AM, Earwin Burrfoot wrote: > So, I want to pump my IndexWriter hard and fast with documents. Nice. > Removing fsync from FSDirectory helps. But for that I pay with possibility of > index corruption, not only if my node suddenly loses > power/kernelpanics, but also i

Re: Getting fsync out of the loop

2010-04-07 Thread Michael McCandless
On Tue, Apr 6, 2010 at 7:26 PM, Earwin Burrfoot wrote: >> Running out of disk space with fsync disabled won't lead to corruption. >> Even kill -9 the JRE process with fsync disabled won't corrupt. >> In these cases index just falls back to last successful commit. >> >> It's "only" power loss / OS

Re: Commit freeze in flex branch

2010-04-07 Thread Michael McCandless
Yes +1 to that -- thanks Uwe!! And thanks for the many other people who helped out on flex. It's a big and exciting improvement :) Mike On Wed, Apr 7, 2010 at 4:11 PM, Michael Busch wrote: > Uwe, thanks for doing all the svn work!  Was a smooth transition! > >  Michael > > On 4/6/10 12:27 PM,

Re: Getting fsync out of the loop

2010-04-08 Thread Michael McCandless
On Wed, Apr 7, 2010 at 3:27 PM, Earwin Burrfoot wrote: >> No, this doesn't make sense.  The OS detects a disk full on accepting >> the write into the write cache, not [later] on flushing the write >> cache to disk.  If the OS accepts the write, then disk is not full (ie >> flushing the cache will

Re: Move NoDeletionPolicy to core

2010-04-08 Thread Michael McCandless
+1 I don't think bw needs to be kept -- contrib/benchmark is allowed to change. Mike On Thu, Apr 8, 2010 at 5:44 AM, Shai Erera wrote: > Hi > > I've noticed benchmark has a NoDeletionPolicy class and I was wondering if > we can move it to core. I might want to use it for the parallel index stuf

Re: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache)

2010-04-08 Thread Michael McCandless
Actually Toke opened a new issue (LUCENE-2369) for the new approach to Locale-based sorting... I think we should leave the existing issue as the single-segment optimization (it's a separate issue). Mike On Thu, Apr 8, 2010 at 6:06 PM, Chris Hostetter wrote: > > : > Is it possible to change it? I

Re: Getting fsync out of the loop

2010-04-08 Thread Michael McCandless
On Thu, Apr 8, 2010 at 6:21 PM, Earwin Burrfoot wrote: >> But, IW doesn't let you "hold on to" checkpoints... only to commits. >> >> Ie SnapshotDP will only "see" actual commit/close calls, not >> intermediate checkpoints like a random segment merge completing, a >> flush happening, etc. >> >> Or.

  1   2   3   4   5   6   7   8   9   10   >