[jira] Commented: (LUCENE-689) NullPointerException thrown by equals method in SpanOrQuery

2008-11-17 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648240#action_12648240 ] Steven Parkes commented on LUCENE-689: -- Thanks, Otis. Was just gonna remove mysel

[jira] Updated: (LUCENE-870) add concurrent merge policy

2007-08-20 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-870: - Attachment: CMP.patch.txt Mike expressed interest in pursuing this with an alternative strargey

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-18 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520881 ] Steven Parkes commented on LUCENE-847: -- my feeling is we should not deprecate

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-16 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520408 ] Steven Parkes commented on LUCENE-847: -- I don't think so: I think if someone changes the merge p

[jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

2007-08-16 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520360 ] Steven Parkes commented on LUCENE-845: -- I think the combination of these two changes should give a net

[jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

2007-08-16 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520336 ] Steven Parkes commented on LUCENE-845: -- Here's an idea: maybe we can accept the O(N^2) merge cost,

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-16 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520293 ] Steven Parkes commented on LUCENE-847: -- One new small item: you've added a "public void m

[jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

2007-08-16 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520268 ] Steven Parkes commented on LUCENE-845: -- I understand the merge problem but I'm still concerned abou

[jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

2007-08-15 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520130 ] Steven Parkes commented on LUCENE-845: -- This increases file descriptor usage in some cases, right? In the old

[jira] Updated: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-15 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-847: - Attachment: LUCENE-847.patch.txt Updated patch: * Don't call deprecated methods -

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-14 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519793 ] Steven Parkes commented on LUCENE-847: -- It just occurred to me that there is a neat way to handle

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-14 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519790 ] Steven Parkes commented on LUCENE-847: -- Are you going to fix all unit tests that call the now

[jira] Updated: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article

2007-08-08 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-971: - Attachment: LUCENE-971.patch.txt All agreed and fixed. Thanks. > Create enwiki indexable d

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518251 ] Steven Parkes commented on LUCENE-847: -- Ah. I understand better now. I have to admit, I haven't kept up to

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518222 ] Steven Parkes commented on LUCENE-847: -- On a related note, Mike, there a few FIXME's in IW relat

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518210 ] Steven Parkes commented on LUCENE-847: -- Is the separate IndexMerger interface really necessary? I wrestled

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518165 ] Steven Parkes commented on LUCENE-847: -- I think we ideally would like concurrency to be fully

[jira] Updated: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article

2007-08-06 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-971: - Attachment: LUCENE-971.patch.txt Okay. Here's an update to the patch. Change

[jira] Updated: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article

2007-08-06 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-971: - Assignee: Steven Parkes Lucene Fields: [Patch Available] (was: [Patch Available, New

[jira] Commented: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article

2007-08-06 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518016 ] Steven Parkes commented on LUCENE-971: -- Sounds good. New patch soon. > Create enwiki indexable data as l

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-06 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518013 ] Steven Parkes commented on LUCENE-847: -- For the time being, the patch also contains some of the code from

[jira] Updated: (LUCENE-870) add concurrent merge policy

2007-08-06 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-870: - Attachment: concurrentMerge.patch Copy Ning's concurrency patch over here, since LUCENE-8

[jira] Updated: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-06 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-847: - Attachment: LUCENE-847.patch.txt Here's an update to the patch. I wouldn't say it

RE: [VOTE] Migrate Lucene to JDK 1.5 for 3.0 release

2007-08-02 Thread Steven Parkes
H ... just a nit (or did I miss something?) in (2), do you mean 2.3? -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, August 02, 2007 5:29 AM To: java-dev@lucene.apache.org Subject: Re: [VOTE] Migrate Lucene to JDK 1.5 for 3.0 release OK, I think al

[jira] Commented: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article

2007-08-01 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516997 ] Steven Parkes commented on LUCENE-971: -- I can look at what it would take to avoid the line file ... but

[jira] Updated: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article

2007-07-31 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-971: - Attachment: LUCENE-971.patch.txt > Create enwiki indexable data as line-per-article rather t

[jira] Created: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article

2007-07-31 Thread Steven Parkes (JIRA)
Issue Type: Improvement Reporter: Steven Parkes Create a line per article rather than a file. Consume with indexLineFile task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online

RE: Token termBuffer issues

2007-07-26 Thread Steven Parkes
First I create a single large file that has one doc per line from Wikipedia content, using this alg Anybody disagree that the 1-line-per-doc format is better (at least for Wikipedia)? If so, I'll get rid of the intermediate one-file-per-doc step. --

[jira] Updated: (LUCENE-962) I/O exception in DocsWriter add or updateDocument may not delete unreferenced files

2007-07-17 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-962: - Attachment: LUCENE-962.patch.txt Patch adds wrappers in IndexWriter to catch exceptions thrown

[jira] Created: (LUCENE-962) I/O exception in DocsWriter add or updateDocument may not delete unreferenced files

2007-07-17 Thread Steven Parkes (JIRA)
: Lucene - Java Issue Type: Bug Affects Versions: 2.2 Reporter: Steven Parkes Assignee: Steven Parkes Priority: Minor Fix For: 2.3 If an I/O exception is thrown in DocumentsWriter#addDocument or #updateDocument, the stored fields files

binary at the front of CHANGES.txt

2007-07-17 Thread Steven Parkes
Can we get rid of the binary characters at the front of CHANGES.txt? Or do the mean something? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-07-12 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-938: - Attachment: LUCENE-938.patch.txt New patch. Removes support for buf deletes around transactions

[jira] Commented: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-07-12 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512289 ] Steven Parkes commented on LUCENE-938: -- Works for me. I'll submit a new patch. > I/O exceptions can ca

[jira] Commented: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-07-12 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512274 ] Steven Parkes commented on LUCENE-938: -- Okay. Got it. But your earlier note got me thinking. Mike, as far as I

[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-07-12 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512264 ] Steven Parkes commented on LUCENE-843: -- Did we lose the triggered merge stuff from 887, i.e.,, should it be

[jira] Commented: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-07-12 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512249 ] Steven Parkes commented on LUCENE-938: -- Easy first: there's a comment in the code about cloning the buf d

[jira] Commented: (LUCENE-953) Snowball has new Stemmers available

2007-07-10 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511592 ] Steven Parkes commented on LUCENE-953: -- See also LUCENE-740. > Snowball has new Stemmers availa

[jira] Commented: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-07-04 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510235 ] Steven Parkes commented on LUCENE-938: -- Thanks. I figured there'd be conflicts. I won't be able

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-06-28 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508833 ] Steven Parkes commented on LUCENE-848: -- Trying to reproduce now. Something that came up while restarting the

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-06-27 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508661 ] Steven Parkes commented on LUCENE-848: -- Actually, I just noticed wikimedia provides the md5 hashes. I was able

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-06-26 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508327 ] Steven Parkes commented on LUCENE-848: -- Let me see if I can replicate. Can you do a sha1sum on your enwiki

[jira] Updated: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-06-21 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-938: - Attachment: LUCENE-938.txt This version has the missing fixes that got tossed when I tried to

[jira] Commented: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-06-21 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507052 ] Steven Parkes commented on LUCENE-938: -- Only it's broke. Mixing a couple of things, I missed a couple of

[jira] Updated: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-06-21 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-938: - Lucene Fields: [New, Patch Available] (was: [New]) Affects Version/s: (was: 2.3

[jira] Updated: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-06-21 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-938: - Attachment: LUCENE-938.txt Patch that fixes the two relevant rollback mechanisms in IndexWriter

[jira] Created: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-06-21 Thread Steven Parkes (JIRA)
Affects Versions: 2.3 Reporter: Steven Parkes Assignee: Steven Parkes Fix For: 2.3 Some I/O exceptions that result in segmentInfos rollback operations can cause buffered deletes that existed before the rollback creation point to be incorrectly lost when the

[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506609 ] Steven Parkes commented on LUCENE-843: -- Yeah, that was it. I'll be delving more into the code as I t

[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506576 ] Steven Parkes commented on LUCENE-843: -- I've started looking at this, what it would take to merge wit

RE: [jira] Commented: (LUCENE-930) fail build if contrib tests fail to compile

2007-06-18 Thread Steven Parkes
Alright, I think this patch looks good. Maybe Steven could take a look as well? Then I think you can go ahead and also commit it to the 2.2 branch. Sorry; I was on vacation last week. Thanks for picking up the slack, Chris. -

[jira] Commented: (LUCENE-930) fail build if contrib tests fail to compile

2007-06-08 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502959 ] Steven Parkes commented on LUCENE-930: -- Thanks, Chris. Sorry for the extra steps. > fail build if cont

[jira] Reopened: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-06-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes reopened LUCENE-885: -- Lucene Fields: [Patch Available] (was: [New]) Reopening to get these tweaks in. > clean

[jira] Updated: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-06-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-885: - Attachment: (was: LUCENE-885.patch) > clean up build files so contrib tests are run m

[jira] Updated: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-06-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-885: - Attachment: LUCENE-885-pt2.patch Renaming the patch to make it clear it adds to the previous

[jira] Updated: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-06-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-885: - Attachment: LUCENE-885.patch This patch file removes the swallowed failures by doing more of the

[jira] Commented: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-06-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502576 ] Steven Parkes commented on LUCENE-885: -- Whoops, the primary failure was because I wasn't up to date. Act

[jira] Commented: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-06-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502570 ] Steven Parkes commented on LUCENE-885: -- Ah, I see. Should, then, test-contrib depend on build-contrib, rather

[jira] Commented: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-06-07 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502551 ] Steven Parkes commented on LUCENE-885: -- Looking at the current build (r545324) it looks like the some contrib

RE: [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-06-05 Thread Steven Parkes
Key: LUCENE-848 > URL: https://issues.apache.org/jira/browse/LUCENE-848 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/benchmark > Reporter: Steven Parkes >Assignee: Grant I

[jira] Commented: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java -> Kraaij-Pohlmann gives Index-OOB Exception

2007-06-05 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501703 ] Steven Parkes commented on LUCENE-740: -- Do we want to consider this a candidate for 2.2? In any case, the

[jira] Commented: (LUCENE-908) Lucli doesn't include standard MANIFEST.MF

2007-06-05 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501702 ] Steven Parkes commented on LUCENE-908: -- I'm pretty sure it's to get the stuff that's in the

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-06-04 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501344 ] Steven Parkes commented on LUCENE-848: -- It looks like the latest successful dump is http

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-06-01 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500903 ] Steven Parkes commented on LUCENE-848: -- I'll leave the hosting site to others; I don't know enough ab

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-06-01 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500896 ] Steven Parkes commented on LUCENE-848: -- Grant was looking at hosting a copy of the dataset on zones so that

[jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-05-31 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500521 ] Steven Parkes commented on LUCENE-763: -- Can we also update the javadocs to reflect the different semantics

RE: addIndexes()

2007-05-31 Thread Steven Parkes
riginal Message- From: Andi Vajda [mailto:[EMAIL PROTECTED] Sent: Thursday, May 31, 2007 10:10 AM To: java-dev@lucene.apache.org Subject: Re: addIndexes() On Thu, 31 May 2007, Doug Cutting wrote: > Steven Parkes wrote: >> Is there any particular reason that the version that takes a

addIndexes()

2007-05-30 Thread Steven Parkes
I'm cleaning up the patch for LUCENE-847 (factored merge policy) and noticed a couple of things about the addIndexes methods. Is there any particular reason that the version that takes a Directory[] optimizes first? The later merge is going to use the normal logarithmic stepping; is there a compel

RE: IndexWriter shutdown

2007-05-23 Thread Steven Parkes
> Or instead of "shutdown" it's more of a "interrupt the > merge if it's in progress" which then doesn't prevent further IO? At a high level, this would seem like the most valuable approach. But I think we would want to distinguish between writing new documents and merges of existing segments. The

RE: IndexWriter shutdown

2007-05-22 Thread Steven Parkes
> I'm not certain, but would parts of your goal be achieved by the work i've > seen floating arround Jira to refactor th MergePolicy so that it can be > handled by multiple thrads? Well, in what I've been working on for LUCENE-847 (merge policy factoring) and LUCENE-870 (concurrent merge policy),

RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-05-02 Thread Steven Parkes
ansactions. I'm pretty wary of touching that code. Is there a way around that? -Original Message- From: Ning Li [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02, 2007 7:54 AM To: java-dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-847) Factor merge policy out of Ind

RE: [jira] Created: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize

2007-05-02 Thread Steven Parkes
As I understand it (correct me if I'm wrong, Mike), there will be a cascading merge in this case, but not an optimize, because you don't merge all (twenty) segments at each level, just half of them (ten). -Original Message- From: Ning Li [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02,

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-05-02 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Lucene Fields: [Patch Available] > Add supported for Wikipedia English as a corpus in

[jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

2007-04-30 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492814 ] Steven Parkes commented on LUCENE-845: -- Following up on this, it's basically the idea that segments ought

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-30 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt Close to http://java.sun.com/docs/codeconv/html/CodeConventions.doc7

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-30 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492756 ] Steven Parkes commented on LUCENE-848: -- Ath. That would be because I was thinking vertically, not horizontally

[jira] Commented: (LUCENE-870) add concurrent merge policy

2007-04-27 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492376 ] Steven Parkes commented on LUCENE-870: -- Sigh. My typo rate's been too high lately. The depends-on link

[jira] Created: (LUCENE-870) add concurrent merge policy

2007-04-27 Thread Steven Parkes (JIRA)
add concurrent merge policy --- Key: LUCENE-870 URL: https://issues.apache.org/jira/browse/LUCENE-870 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Steven Parkes

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-27 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt Well, here's a version with less whitespace. But, I have to

RE: [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-24 Thread Steven Parkes
I think. I can ask on legal- discuss. This is the license I found: http://en.wikipedia.org/wiki/ Wikipedia:Text_of_the_GFDL found via http://en.wikipedia.org/wiki/ Wikipedia:Database_download On Apr 24, 2007, at 6:01 PM, Steven Parkes wrote: > They don't seem to keep things arou

RE: [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-24 Thread Steven Parkes
urse, that may not be a good idea bandwidth wise. I'm open to suggestions. Maybe using the latest isn't that big of a deal. On Apr 24, 2007, at 2:45 PM, Steven Parkes (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-848? > page=com.atlassian.jira.plugin.syst

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-24 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt Here's the patch with the README. By the way, there's als

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-24 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491396 ] Steven Parkes commented on LUCENE-848: -- Yeah, it takes a while to download. I added the jars since that's

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-24 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491393 ] Steven Parkes commented on LUCENE-848: -- Both jars are from xerces-2.9.0. > Add supported for Wikipedia Engl

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-23 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491080 ] Steven Parkes commented on LUCENE-848: -- yeah; think so; it worked for my benchmarking stuff on a couple of

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-04-19 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490192 ] Steven Parkes commented on LUCENE-847: -- Here are some numbers comparing the load performance for the factored

RE: optimize() method call

2007-04-18 Thread Steven Parkes
I think can be greater than linear. It would be linear if optimize only copied each segment into the result. However, it will only merge maxMerge segments at a time, so in some cases, some segment data is going to be copied more than once. So something like O(n log n)? -Original Message- F

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-18 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: xml-apis.jar Now I see the button for attach multiple files. Oh, well. Anyway, both

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-18 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: xerces.jar > Add supported for Wikipedia English as a corpus in the benchmar

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-18 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt Upgrade to Xerces 2. Xerces 1 passes the sanity check, but fails for

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-17 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt Okay, I've tested this patch against 1.4, 1.5, and 1.6. I

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-17 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: xerces.jar Here's the version of xerces that I used, to go in contrib/benchmar

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-16 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489283 ] Steven Parkes commented on LUCENE-848: -- Blah. This patch doesn't work quite right with 1.4. My intention w

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-16 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt Update of the previous patch. Used Doron's suggestion for var

[jira] Created: (LUCENE-864) contrib/benchmark files need eol-style set

2007-04-16 Thread Steven Parkes (JIRA)
/benchmark Affects Versions: 2.1 Reporter: Steven Parkes Priority: Minor The following files in contrib/benchmark don't have eol-style set to native, so when they are checked out, they don't get converted. ./build.xml: ./C

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487609 ] Steven Parkes commented on LUCENE-848: -- That's what I meant (and did). If it's okay, I'll bun

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487600 ] Steven Parkes commented on LUCENE-848: -- By the way, that's a rough patch. I'm cleaning it up as I

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-09 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt This patch is a first cut a wikipedia benchmark support. It downloads

RE: contrib/benchmark - DOS line endings

2007-04-02 Thread Steven Parkes
I think you need to make the files consistent first, something like dos2unix or unix2dos should do this, if they're available on your machine. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2007 3:47 PM To: java-dev@lucene.apache.org Subject: c

RE: [jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-02 Thread Steven Parkes
Yes, indeed. May not be necessary initially, but we could support XPath or something down the road to allow us to specify what things > I wouldn't worry about generalizing too much > to start with. Once we have a couple collections then we can go that > route. My thoughts, too. I've been

RE: [jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-02 Thread Steven Parkes
7, at 1:09 PM, Steven Parkes (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-848? > page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] > > Steven Parkes updated LUCENE-848: > - > > Description: Ad

[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-03-28 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Description: Add support for using Wikipedia for benchmarking. (was: Add support for using

  1   2   3   >