Re: Fix to contrib/misc/HighFreqTerms.java

2010-04-17 Thread Michael McCandless
- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, April 14, 2010 3:50 PM To: java-dev@lucene.apache.org Subject: Re: Bug in contrib/misc/HighFreqTerms.java? OK I committed the fix.  I ran it on a flex wikipedia index I had... it produces output like this: body:[3c

Re: Proposal about Version API relaxation

2010-04-16 Thread Michael McCandless
Getting back to the stable/experimental branches... I think, with separate stable experimental branches, development would/should be active on both branches. It'd depend on the feature... Eg today we'd have 3.x stable branch and the experimental branch (= trunk). Small features, bug fixes,

Re: Proposal about Version API relaxation

2010-04-16 Thread Michael McCandless
, Michael McCandless wrote: Getting back to the stable/experimental branches... I think, with separate stable  experimental branches, development would/should be active on both branches.  It'd depend on the feature... Eg today we'd have 3.x stable branch and the experimental branch (= trunk

[jira] Commented: (LUCENE-2398) Improve tests to work easier from IDEs

2010-04-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857914#action_12857914 ] Michael McCandless commented on LUCENE-2398: This is a great cleanup Robert

Re: SnapshotDeletionPolicy throws NPE if no commit happened

2010-04-15 Thread Michael McCandless
Presumably you'd also hit this exception if the DP deletes all commit points, right? I like IllegalStateException. Mike 2010/4/15 Shai Erera ser...@gmail.com: BTW, even if it's a stupid thing to do, someone can today create SDP and call snapshot without ever creating IW. And it's not an

Re: TestCodecs running time

2010-04-15 Thread Michael McCandless
Yah :) TestStressIndexing2 is another slow one... I'll go fix it... Mike On Thu, Apr 15, 2010 at 2:15 AM, Shai Erera ser...@gmail.com wrote: See you already did that Mike :). Thanks ! now the tests run for 2s. Shai On Fri, Apr 9, 2010 at 12:49 PM, Michael McCandless luc

Re: Proposal about Version API relaxation

2010-04-15 Thread Michael McCandless
2010/4/15 Shai Erera ser...@gmail.com: One way is to define 'major' as X and minor X.Y, and another is to define major as 'X.Y' and minor as 'X.Y.Z'. I prefer the latter but don't have any strong feelings against the former. I prefer X.Y, ie, changes to Y only is a minor release (mostly bug

[jira] Resolved: (LUCENE-1278) Add optional storing of document numbers in term dictionary

2010-04-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1278. Resolution: Won't Fix I think the pulsing codec (wraps any other codec

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-04-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857373#action_12857373 ] Michael McCandless commented on LUCENE-2324: bq. The usual design is a queued

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-04-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857381#action_12857381 ] Michael McCandless commented on LUCENE-2324: {quote} i would love to be able

Re: Proposal about Version API relaxation

2010-04-15 Thread Michael McCandless
Unfortunately, live searching against an old index can get very hairy. EG look at what I had to do for the flex API on pre-flex index flex emulation layer. It's also not great because it gives the illusion that all is good, yet, you've taken a silent hit (up to ~10% or so) in your search perf.

Re: Proposal about Version API relaxation

2010-04-15 Thread Michael McCandless
On Thu, Apr 15, 2010 at 3:50 PM, Robert Muir rcm...@gmail.com wrote: for now simply moving analyzers to its own jar filE would be a great step! +1 -- why not consolidate all analyzers now? (And fix indexer to require a minimal API = TokenStream minus reset close). Mike

Re: Google-developed posting list encoding

2010-04-14 Thread Michael McCandless
Flex has already landed (in trunk, for 3.1), so this is just a matter of someone creating a codec using Group VarInt. Mike On Wed, Apr 14, 2010 at 4:58 AM, John Wang john.w...@gmail.com wrote: This would be something that's excellent for contribution after the Flex-Indexing support is added.

[jira] Updated: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-04-14 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2387: --- Attachment: LUCENE-2387-29x.patch 29x version of this patch. IndexWriter retains

Re: Bug in contrib/misc/HighFreqTerms.java?

2010-04-14 Thread Michael McCandless
Ugh, I'll fix this. With the new flex API, you can't ask a composite (Multi/DirReader) for its postings -- you have to go through the static methods on MultiFields. I'm trying to put some distance b/w IndexReader and composite readers... because I'd like to eventually deprecate them. Ie, the

Re: Bug in contrib/misc/HighFreqTerms.java?

2010-04-14 Thread Michael McCandless
72] 536480 body:[55 6e 69 74 65 64] 543746 Which is not very readable, but, it does this because flex terms are arbitrary byte[], not necessarily utf8... maybe we should fix it to print both hex and String if we assume bytes are utf8? Mike On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless luc

Re: Proposal about Version API relaxation

2010-04-14 Thread Michael McCandless
On Wed, Apr 14, 2010 at 12:06 AM, Marvin Humphrey mar...@rectangular.com wrote: Essentially, we're free to break back compat within Lucy at any time, but we're not able to break back compat within a stable fork like Lucy1, Lucy2, etc. So what we'll probably do during normal development with

[jira] Commented: (LUCENE-2393) Utility to output total term frequency and df from a lucene index

2010-04-14 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857121#action_12857121 ] Michael McCandless commented on LUCENE-2393: Programmatically indexing those

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-04-14 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857124#action_12857124 ] Michael McCandless commented on LUCENE-2324: This is awesome Michael! Much

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856357#action_12856357 ] Michael McCandless commented on LUCENE-2386: Patch looks good Shai

[jira] Resolved: (LUCENE-2111) Wrapup flexible indexing

2010-04-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2111. Resolution: Fixed Wrapup flexible indexing

[jira] Commented: (LUCENE-2371) Update fileformats spec to match how flex's standard codec writes terms

2010-04-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856479#action_12856479 ] Michael McCandless commented on LUCENE-2371: Reminder to future self: make

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855905#action_12855905 ] Michael McCandless commented on LUCENE-2392: bq. Mike, I don't think

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855906#action_12855906 ] Michael McCandless commented on LUCENE-2392: bq. I think what I'm saying

[jira] Commented: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855916#action_12855916 ] Michael McCandless commented on LUCENE-2316: bq. I'm also ok w/ the bw break

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856022#action_12856022 ] Michael McCandless commented on LUCENE-2386: Shai, can you also test CREATE

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856065#action_12856065 ] Michael McCandless commented on LUCENE-2386: Yeah I think new IW(), set

[jira] Commented: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855736#action_12855736 ] Michael McCandless commented on LUCENE-2316: I don't think Lucene relies

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855738#action_12855738 ] Michael McCandless commented on LUCENE-2386: I like the fix (catching

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855747#action_12855747 ] Michael McCandless commented on LUCENE-2386: Actually I consider this a bug

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855750#action_12855750 ] Michael McCandless commented on LUCENE-2386: Shai I think you should also

[jira] Created: (LUCENE-2392) Enable flexible scoring

2010-04-11 Thread Michael McCandless (JIRA)
Enable flexible scoring --- Key: LUCENE-2392 URL: https://issues.apache.org/jira/browse/LUCENE-2392 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless

[jira] Updated: (LUCENE-2392) Enable flexible scoring

2010-04-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2392: --- Attachment: LUCENE-2392.patch Rough first patch attached Enable flexible

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855786#action_12855786 ] Michael McCandless commented on LUCENE-2386: Patch looks good... thanks Shai

Re: IndexWriter memory leak?

2010-04-10 Thread Michael McCandless
app! On Fri, Apr 9, 2010 at 12:32 PM, Michael McCandless luc...@mikemccandless.com wrote: I agree IW should not hold refs to the Field instances from the last doc indexed... I put a patch on LUCENE-2387 to null the reference as we go.  Can you confirm this lets GC reclaim? Mike On Fri, Apr

[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space

2010-04-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855548#action_12855548 ] Michael McCandless commented on LUCENE-2376: Yes total unique fields are 4

Re: TestCodecs running time

2010-04-09 Thread Michael McCandless
It's also slow because it repeats all the tests for each of the core codecs (standard, sep, pulsing, intblock). I think it's fine to reduce the number of iterations -- just make sure there's no seed to newRandom() so the distributing testing is effective. Mike On Fri, Apr 9, 2010 at 12:43 AM,

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855333#action_12855333 ] Michael McCandless commented on LUCENE-2386: bq. This is a behavioral bw break

[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855336#action_12855336 ] Michael McCandless commented on LUCENE-2376: Hmm indeed you have a great many

[jira] Assigned: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2387: -- Assignee: Michael McCandless IndexWriter retains references to Readers used

[jira] Updated: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2387: --- Attachment: LUCENE-2387.patch Attached patch nulls out the Fieldable reference

Re: IndexWriter memory leak?

2010-04-09 Thread Michael McCandless
I agree IW should not hold refs to the Field instances from the last doc indexed... I put a patch on LUCENE-2387 to null the reference as we go. Can you confirm this lets GC reclaim? Mike On Fri, Apr 9, 2010 at 12:54 AM, Ruben Laguna ruben.lag...@gmail.com wrote: But the Readers I'm talking

[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855343#action_12855343 ] Michael McCandless commented on LUCENE-2364: Maybe we should simply deprecate

[jira] Commented: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855347#action_12855347 ] Michael McCandless commented on LUCENE-2387: I agree, Uwe -- I'll fold

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855364#action_12855364 ] Michael McCandless commented on LUCENE-2386: How about we subclass FNFE? Eg

[jira] Resolved: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2387. Resolution: Fixed Fix Version/s: 3.1 IndexWriter retains references

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855421#action_12855421 ] Michael McCandless commented on LUCENE-2386: Patch looks good! Hmm... maybe

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855470#action_12855470 ] Michael McCandless commented on LUCENE-2386: I think oal.index is good

[jira] Commented: (LUCENE-2372) Replace deprecated TermAttribute by new CharTermAttribute

2010-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855492#action_12855492 ] Michael McCandless commented on LUCENE-2372: +1 to making KeywordAnalyzer

Re: Getting fsync out of the loop

2010-04-08 Thread Michael McCandless
On Wed, Apr 7, 2010 at 3:27 PM, Earwin Burrfoot ear...@gmail.com wrote: No, this doesn't make sense.  The OS detects a disk full on accepting the write into the write cache, not [later] on flushing the write cache to disk.  If the OS accepts the write, then disk is not full (ie flushing the

[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space

2010-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854876#action_12854876 ] Michael McCandless commented on LUCENE-2376: OK but I suspect the root cause

Re: Move NoDeletionPolicy to core

2010-04-08 Thread Michael McCandless
+1 I don't think bw needs to be kept -- contrib/benchmark is allowed to change. Mike On Thu, Apr 8, 2010 at 5:44 AM, Shai Erera ser...@gmail.com wrote: Hi I've noticed benchmark has a NoDeletionPolicy class and I was wondering if we can move it to core. I might want to use it for the

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855135#action_12855135 ] Michael McCandless commented on LUCENE-2386: I agree: IW really should

Re: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache)

2010-04-08 Thread Michael McCandless
Actually Toke opened a new issue (LUCENE-2369) for the new approach to Locale-based sorting... I think we should leave the existing issue as the single-segment optimization (it's a separate issue). Mike On Thu, Apr 8, 2010 at 6:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Is it

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855215#action_12855215 ] Michael McCandless commented on LUCENE-2386: I think the patch is good Shai

Re: Getting fsync out of the loop

2010-04-08 Thread Michael McCandless
On Thu, Apr 8, 2010 at 6:21 PM, Earwin Burrfoot ear...@gmail.com wrote: But, IW doesn't let you hold on to checkpoints... only to commits. Ie SnapshotDP will only see actual commit/close calls, not intermediate checkpoints like a random segment merge completing, a flush happening, etc.

[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854398#action_12854398 ] Michael McCandless commented on LUCENE-2376: Is this the same issue as LUCENE

[jira] Commented: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854401#action_12854401 ] Michael McCandless commented on LUCENE-2377: Patch looks good Shai! Enable

Re: Getting fsync out of the loop

2010-04-07 Thread Michael McCandless
On Tue, Apr 6, 2010 at 7:26 PM, Earwin Burrfoot ear...@gmail.com wrote: Running out of disk space with fsync disabled won't lead to corruption. Even kill -9 the JRE process with fsync disabled won't corrupt. In these cases index just falls back to last successful commit. It's only power loss

[jira] Commented: (LUCENE-2373) Change StandardTermsDictWriter to work with streaming and append-only filesystems

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854409#action_12854409 ] Michael McCandless commented on LUCENE-2373: I would love to make Lucene truly

[jira] Created: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-04-07 Thread Michael McCandless (JIRA)
Cutover remaining usage of pre-flex APIs Key: LUCENE-2378 URL: https://issues.apache.org/jira/browse/LUCENE-2378 Project: Lucene - Java Issue Type: Improvement Reporter: Michael

[jira] Created: (LUCENE-2379) TermRangeQuery FieldCacheRangeFilter should accepts BytesRef

2010-04-07 Thread Michael McCandless (JIRA)
: Improvement Components: Search Affects Versions: 3.1 Reporter: Michael McCandless Fix For: 3.1 With flex, a term is a byte[] (BytesRef) not a String... we need to push this up the search stack. TermRangeQuery / FieldCacheRangeFilter.newStringRange now take a String

[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854575#action_12854575 ] Michael McCandless commented on LUCENE-2364: Also

[jira] Resolved: (LUCENE-2379) TermRangeQuery FieldCacheRangeFilter should accepts BytesRef

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2379. Resolution: Duplicate Woops -- dup of LUCENE-2364. TermRangeQuery

[jira] Created: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-04-07 Thread Michael McCandless (JIRA)
Reporter: Michael McCandless Fix For: 3.1 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode string, but not necessarily), so we need to push this up the search stack. FieldCache now has getStrings and getStringIndex; we need corresponding methods

[jira] Created: (LUCENE-2381) Use packed ints for sort ords (in FieldCache.getStringIndex/.getTermBytesIndex)

2010-04-07 Thread Michael McCandless (JIRA)
- Java Issue Type: Improvement Reporter: Michael McCandless Fix For: 3.1 We wastefully use a whole int today, but for enumerated fields (eg country, state, color, category) this is very wasteful since you could use only a few bits per doc when

[jira] Created: (LUCENE-2382) Merging implemented by codecs must catch aborted merges

2010-04-07 Thread Michael McCandless (JIRA)
Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1 This is a regression (we lost functionality on landing flex). When you close IW with false (meaning abort all running merges), IW asks the merge threads to abort. The threads

[jira] Commented: (LUCENE-1536) if a filter can support random access API, we should use it

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854612#action_12854612 ] Michael McCandless commented on LUCENE-1536: With flex, you can now get

[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854615#action_12854615 ] Michael McCandless commented on LUCENE-2380: We could also do shared byte

[jira] Created: (LUCENE-2383) Some small fixes after the flex merge...

2010-04-07 Thread Michael McCandless (JIRA)
Some small fixes after the flex merge... Key: LUCENE-2383 URL: https://issues.apache.org/jira/browse/LUCENE-2383 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless

[jira] Updated: (LUCENE-2383) Some small fixes after the flex merge...

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2383: --- Attachment: LUCENE-2383.patch Some small fixes after the flex merge

Re: Commit freeze in flex branch

2010-04-07 Thread Michael McCandless
Yes +1 to that -- thanks Uwe!! And thanks for the many other people who helped out on flex. It's a big and exciting improvement :) Mike On Wed, Apr 7, 2010 at 4:11 PM, Michael Busch busch...@gmail.com wrote: Uwe, thanks for doing all the svn work!  Was a smooth transition!  Michael On

[jira] Commented: (LUCENE-2383) Some small fixes after the flex merge...

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854684#action_12854684 ] Michael McCandless commented on LUCENE-2383: Thanks Uwe, I agree that's

[jira] Resolved: (LUCENE-2383) Some small fixes after the flex merge...

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2383. Resolution: Fixed Some small fixes after the flex merge

[jira] Commented: (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.

2010-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854717#action_12854717 ] Michael McCandless commented on LUCENE-2364: Once we fix Term to take

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853811#action_12853811 ] Michael McCandless commented on LUCENE-2361: Hmm are you sure you're setting

[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853815#action_12853815 ] Michael McCandless commented on LUCENE-2329: bq. We could move

[jira] Updated: (LUCENE-2265) improve automaton performance by running on byte[]

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2265: --- Attachment: LUCENE-2265.patch bq. The problem is it does not handle at least

[jira] Updated: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2329: --- Attachment: LUCENE-2329.patch New patch, init'ing the postings arrays in THPF.start

[jira] Created: (LUCENE-2371) Update fileformats spec to match how flex's standard codec writes terms

2010-04-06 Thread Michael McCandless (JIRA)
Issue Type: Bug Components: Website Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1 The standard codec changes how the terms index is written (eg uses packed ints, writes a whole field's terms at once, etc.)... we have to fix

[jira] Commented: (LUCENE-2370) Reintegrate flex branch into trunk

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854112#action_12854112 ] Michael McCandless commented on LUCENE-2370: The bug is LUCENE-1976 -- after

[jira] Reopened: (LUCENE-1976) isCurrent() and getVersion() on an NRT reader are broken

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-1976: Reopening to fix on 3.1 after flex lands... isCurrent() and getVersion() on an NRT

[jira] Updated: (LUCENE-1976) isCurrent() and getVersion() on an NRT reader are broken

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1976: --- Fix Version/s: 3.1 isCurrent() and getVersion() on an NRT reader are broken

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854120#action_12854120 ] Michael McCandless commented on LUCENE-2361: Hmm but the above infoStream

Re: Getting fsync out of the loop

2010-04-06 Thread Michael McCandless
On Tue, Apr 6, 2010 at 10:11 AM, Earwin Burrfoot ear...@gmail.com wrote: So, I want to pump my IndexWriter hard and fast with documents. Nice. Removing fsync from FSDirectory helps. But for that I pay with possibility of index corruption, not only if my node suddenly loses

[jira] Resolved: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2329. Resolution: Fixed Third time's a charm? Use parallel arrays instead

[jira] Resolved: (LUCENE-1976) isCurrent() and getVersion() on an NRT reader are broken

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1976. Resolution: Fixed OK fixed on 3.1. isCurrent() and getVersion() on an NRT

[jira] Commented: (LUCENE-1990) Add unsigned packed int impls in oal.util

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854174#action_12854174 ] Michael McCandless commented on LUCENE-1990: OK indeed now I can see

[jira] Resolved: (LUCENE-1990) Add unsigned packed int impls in oal.util

2010-04-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1990. Resolution: Fixed Fix Version/s: (was: Flex Branch

[jira] Commented: (LUCENE-2365) Finding Newest Segment In Empty Index

2010-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853497#action_12853497 ] Michael McCandless commented on LUCENE-2365: Thanks, patch looks good; I'll

[jira] Assigned: (LUCENE-2365) Finding Newest Segment In Empty Index

2010-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2365: -- Assignee: Michael McCandless Finding Newest Segment In Empty Index

[jira] Reopened: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-2329: Reopening -- this fixed causes an intermittent deadlock in TestStressIndexing2. It's

[jira] Updated: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2329: --- Attachment: LUCENE-2329.patch Use parallel arrays instead of PostingList objects

[jira] Resolved: (LUCENE-2365) Finding Newest Segment In Empty Index

2010-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2365. Resolution: Fixed Fix Version/s: (was: 3.0.1) 3.1

[jira] Updated: (LUCENE-2265) improve automaton performance by running on byte[]

2010-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2265: --- Attachment: LUCENE-2265.patch Patch w/ first cut at method to cutover whole UTF32

Re: Term space continuity

2010-04-05 Thread Michael McCandless
The flex API isolates fields, ie you get a TermsEnum for a given field and it enums only the term's text (as a BytesRef). Mike On Mon, Apr 5, 2010 at 7:22 PM, Earwin Burrfoot ear...@gmail.com wrote: A random thought from some of the earlier discussions. Had anybody used the fact that Lucene

[jira] Commented: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-04-04 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853273#action_12853273 ] Michael McCandless commented on LUCENE-2354: Patch looks good Uwe! Convert

Re: Incremental Field Updates

2010-04-03 Thread Michael McCandless
On Sat, Apr 3, 2010 at 1:25 AM, Babak Farhang farh...@gmail.com wrote: I think they get merged in by the merger, ideally in the background. That sounds sensible. (In other words, we wont concern ourselves with roll backs--something possible while a layer is still around.) Actually roll backs

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853110#action_12853110 ] Michael McCandless commented on LUCENE-2361: Can you share some details on how

[jira] Commented: (LUCENE-2362) Add support for slow filters with batch processing

2010-04-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853111#action_12853111 ] Michael McCandless commented on LUCENE-2362: I think in general Lucene should

  1   2   3   4   5   6   7   8   9   10   >