Re: IndexFileNames

2010-02-23 Thread Michael McCandless
This class makes me somewhat nervous, with the changes coming in flex, because the extensions are no longer static but rather a function of the particular codec you're using in the index. I've changed some of the constants accordingly (on flex). Still, I think it's OK to make it public (flex has

Re: IndexFileNames

2010-02-23 Thread Shai Erera
But segmentFileName performs the concatenation internally: static String segmentFileName(String segmentName, String ext) { return segmentName + "." + ext; } So that would not avoid anything right? And still, if someone needs to know whether a certain file is a core Lucene file or his own a

Re: IndexFileNames

2010-02-23 Thread Michael McCandless
Well, there are two issues: * We don't always call IFN.segmentFileName -- often we simply concatenate strings directly throughout the sources. * Should the extensions include '.' or not? I'd really like to fix the first issue. If we do that, then there's only 1 place that performs string co

Re: Looks like we missed a little change for 3.0 ...

2010-02-23 Thread Michael McCandless
Sigh... yes, better to turn these into Jira issues in general. We could make the change under Version? (Change to true, starting in 3.1). Or maybe not make the change. If set to true, we use pct deletion on a segment to reduce its perceived size when selecting merges, which generally causes seg

Re: IndexFileNames

2010-02-23 Thread Shai Erera
I don't think performance is the issue here, but rather correctness. Someone cannot just ask filename.endsWith(DELETION_EXT) as files like "file1del" would match as well. So whenever you make such check, you need to add ".". Again, not performance, but correctness. If we make it public, then we ca

[jira] Commented: (LUCENE-2280) IndexWriter.optimize() throws NullPointerException

2010-02-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837204#action_12837204 ] Michael McCandless commented on LUCENE-2280: Are you sure you're using a stock

Re: IndexFileNames

2010-02-23 Thread Michael McCandless
On Tue, Feb 23, 2010 at 6:46 AM, Shai Erera wrote: > I don't think performance is the issue here, but rather correctness. Someone > cannot just ask filename.endsWith(DELETION_EXT) as files like "file1del" > would match as well. So whenever you make such check, you need to add ".". > Again, not per

Add IndexWriter.doBeforeFlush()

2010-02-23 Thread Shai Erera
Hi, Can we add to IW a doBeforeFlush(), similar to doAfterFlush(), which will get called before flush actually happens (i.e., at the beginning of flush())? IW.flush() is final and so I cannot override it, but I do take advantage of doAfterFlush(). I need though a way to execute some code before th

Re: IndexFileNames

2010-02-23 Thread Shai Erera
ok great ! I'll create an issue and work out a patch. Shai On Tue, Feb 23, 2010 at 1:52 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Tue, Feb 23, 2010 at 6:46 AM, Shai Erera wrote: > > I don't think performance is the issue here, but rather correctness. > Someone > > cannot

[jira] Commented: (LUCENE-2278) FastVectorHighlighter: highlighted term is out of alignment in multi-valued NOT_ANALYZED field

2010-02-23 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837213#action_12837213 ] Koji Sekiguchi commented on LUCENE-2278: I'll commit in a few days. > FastVectorH

[jira] Commented: (LUCENE-1410) PFOR implementation

2010-02-23 Thread Renaud Delbru (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837231#action_12837231 ] Renaud Delbru commented on LUCENE-1410: --- I am reporting here some experiments I perf

Re: Add IndexWriter.doBeforeFlush()

2010-02-23 Thread Michael McCandless
+1 to both adding doAfterFlush and making the two methods protected. Patch? Mike On Tue, Feb 23, 2010 at 6:55 AM, Shai Erera wrote: > Hi, > > Can we add to IW a doBeforeFlush(), similar to doAfterFlush(), which will > get called before flush actually happens (i.e., at the beginning of > flush()

[jira] Updated: (LUCENE-1410) PFOR implementation

2010-02-23 Thread Renaud Delbru (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Delbru updated LUCENE-1410: -- Attachment: for-summary.txt File containing a summary of the results for each bit frame, based

Re: Add IndexWriter.doBeforeFlush()

2010-02-23 Thread Shai Erera
on it ! Shai On Tue, Feb 23, 2010 at 3:11 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > +1 to both adding doAfterFlush and making the two methods protected. > Patch? > > Mike > > On Tue, Feb 23, 2010 at 6:55 AM, Shai Erera wrote: > > Hi, > > > > Can we add to IW a doBeforeFlush(

[jira] Created: (LUCENE-2281) Add doBeforeFlush to IndexWriter

2010-02-23 Thread Shai Erera (JIRA)
Add doBeforeFlush to IndexWriter Key: LUCENE-2281 URL: https://issues.apache.org/jira/browse/LUCENE-2281 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai E

[jira] Updated: (LUCENE-2281) Add doBeforeFlush to IndexWriter

2010-02-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2281: --- Attachment: LUCENE-2281.patch * Added doBeforeFlush + call to it in doFlushInternal * Changed doAfte

[jira] Updated: (LUCENE-1990) Add unsigned packed int impls in oal.util

2010-02-23 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toke Eskildsen updated LUCENE-1990: --- Attachment: LUCENE-1990-te20100223.patch I've renamed most of the classes to short form, as

[jira] Commented: (LUCENE-2281) Add doBeforeFlush to IndexWriter

2010-02-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837284#action_12837284 ] Michael McCandless commented on LUCENE-2281: Patch looks good... thanks Shai!

[jira] Assigned: (LUCENE-2281) Add doBeforeFlush to IndexWriter

2010-02-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2281: -- Assignee: Michael McCandless > Add doBeforeFlush to IndexWriter >

RE: [VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts - Take #2

2010-02-23 Thread Uwe Schindler
Hi all, I also checked the release artifacts in my projects and can conclude, that the 3.0.1 version works correctly for me. 2.9.x is no longer in use here. But both -src artifact files build and test correctly. Signatures are fine and also hashes. So a non-counting +1 from me (non-PMC). Uwe

[jira] Resolved: (LUCENE-2281) Add doBeforeFlush to IndexWriter

2010-02-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2281. Resolution: Fixed Fix Version/s: 3.0.1 Thanks Shai! > Add doBeforeFlush to

[jira] Updated: (LUCENE-2281) Add doBeforeFlush to IndexWriter

2010-02-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2281: --- Fix Version/s: (was: 3.0.1) Woops, not on 3.0.1 (likely). > Add doBeforeFlush t

[jira] Updated: (LUCENE-2281) Add doBeforeFlush to IndexWriter

2010-02-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2281: -- Fix Version/s: 3.0.2 > Add doBeforeFlush to IndexWriter > > >

Re: [VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts - Take #2

2010-02-23 Thread Michael McCandless
+1 to release. I used each version's binary release to build & search a 5M wikipedia index. Search performance is the same for TermQuery with both releases, but for PhraseQuery (at least the 3 simple 2-word phrases I tested) was ~9% faster (20.49 QPS -> 22.29 QPS). Not sure why... but it's movin

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-23 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2111: Attachment: LUCENE-2111_bytesRef.patch here is a rough patch, that merges BytesRef/UnicodeUtil * a

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837359#action_12837359 ] Michael McCandless commented on LUCENE-2111: Patch looks good Robert! Thanks

[jira] Commented: (LUCENE-2279) eliminate pathological performance on StopFilter when using a Set instead of CharArraySet

2010-02-23 Thread thushara wijeratna (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837373#action_12837373 ] thushara wijeratna commented on LUCENE-2279: isn't the resusableTokenStream cr

[jira] Commented: (LUCENE-2279) eliminate pathological performance on StopFilter when using a Set instead of CharArraySet

2010-02-23 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837410#action_12837410 ] Robert Muir commented on LUCENE-2279: - reusableTokenStream() is called again for each

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-23 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837438#action_12837438 ] Robert Muir commented on LUCENE-2111: - ok, i ditched UTF8 result entirely, committed i

[jira] Created: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Shai Erera (JIRA)
Expose IndexFileNames as public, and make use of its methods in the code Key: LUCENE-2282 URL: https://issues.apache.org/jira/browse/LUCENE-2282 Project: Lucene - Java

[jira] Commented: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837446#action_12837446 ] Marvin Humphrey commented on LUCENE-2282: - It seems to me that identifying only co

[jira] Updated: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2282: --- Attachment: LUCENE-2282.patch Patch provides: * IFN constants and methods as public * segmentFileNam

[jira] Commented: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837455#action_12837455 ] Shai Erera commented on LUCENE-2282: bq. What are the applications that we are trying

[jira] Updated: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2282: --- Attachment: LUCENE-2282.patch Forgot to tag IFN as @lucene.internal > Expose IndexFileNames as publ

[jira] Created: (LUCENE-2283) Possible Memory Leak in StoredFieldsWriter

2010-02-23 Thread Tim Smith (JIRA)
Possible Memory Leak in StoredFieldsWriter -- Key: LUCENE-2283 URL: https://issues.apache.org/jira/browse/LUCENE-2283 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.4.1 Repo

[jira] Updated: (LUCENE-2279) eliminate pathological performance on StopFilter when using a Set instead of CharArraySet

2010-02-23 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2279: Priority: Minor (was: Major) > eliminate pathological performance on StopFilter when usin

[jira] Commented: (LUCENE-2279) eliminate pathological performance on StopFilter when using a Set instead of CharArraySet

2010-02-23 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837465#action_12837465 ] Simon Willnauer commented on LUCENE-2279: - I don't consider this as an issue at al

[jira] Commented: (LUCENE-2279) eliminate pathological performance on StopFilter when using a Set instead of CharArraySet

2010-02-23 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837467#action_12837467 ] Robert Muir commented on LUCENE-2279: - in my opinion the issue states one of my bigges

[jira] Commented: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837499#action_12837499 ] Marvin Humphrey commented on LUCENE-2282: - > Any application that extends IW, or p

[jira] Updated: (LUCENE-2167) StandardTokenizer Javadoc does not correctly describe tokenization around punctuation characters

2010-02-23 Thread Shyamal Prasad (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyamal Prasad updated LUCENE-2167: --- Attachment: LUCENE-2167.patch Hi Robert, It's been a while but I finally got around to wor

[jira] Commented: (LUCENE-2167) StandardTokenizer Javadoc does not correctly describe tokenization around punctuation characters

2010-02-23 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837603#action_12837603 ] Robert Muir commented on LUCENE-2167: - bq. Clearly, much of this is an opinion, so I f

[jira] Commented: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837624#action_12837624 ] Shai Erera commented on LUCENE-2282: bq. The thing is, I really don't understand what

[jira] Issue Comment Edited: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837624#action_12837624 ] Shai Erera edited comment on LUCENE-2282 at 2/24/10 4:37 AM: -

[jira] Updated: (LUCENE-2282) Expose IndexFileNames as public, and make use of its methods in the code

2010-02-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2282: --- Attachment: LUCENE-2282.patch Updated TestFileSwitchDirectory to use the constants instead of hard c

RE: [VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts - Take #2

2010-02-23 Thread Uwe Schindler
Hi all, I got three positive votes from: - Andi Vajda - Mike McCandless - Ted Dunning I will copy the release artifacts to the apache dist server this evening and let the mirroring start. During that time I will prepare the website changes and will announce the release as planned. - Uwe Sc