Lucene 2.3 RC2 available

2008-01-11 Thread Michael Busch
Hi all, I just uploaded Lucene 2.3 release candidate 2 to http://people.apache.org/~buschmi/staging_area/lucene_2_3/rc2. Changes compared to RC1: - LUCENE-1125: fix over-zero-filling that was drastically slowing down small docs w/ term vectors - LUCENE-1117: fix EnwikiDocMaker to not hang when

Re: Javadocs and Nightly Builds

2008-01-11 Thread Michael Busch
Chris Hostetter wrote: i trust you :) Thanks! :) i figured it was a work in progress, i just wasn't sure what your plan was for what docs would live in the site and which would live in the release. When i poked arround your preview directory earlier i found two copies of

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557935#action_12557935 ] Michael Busch commented on LUCENE-584: -- {quote} As for PrefixGenerator: in my (up to

[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-584: - Attachment: lucene-584-take4-part2.patch lucene-584-take4-part1.patch {quote}

[jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes

2008-01-11 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557937#action_12557937 ] Michael Busch commented on LUCENE-510: -- I think it makes total sense to change this.

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557955#action_12557955 ] Paul Elschot commented on LUCENE-584: - I'm sorry about my PrefixGenerator remarks, I

[jira] Assigned: (LUCENE-510) IndexOutput.writeString() should write length in bytes

2008-01-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-510: - Assignee: Michael McCandless IndexOutput.writeString() should write length in

[jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes

2008-01-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557953#action_12557953 ] Michael McCandless commented on LUCENE-510: --- Yup, I'll take this!

Re: [jira] Resolved: (LUCENE-559) Turkish Analyzer for Lucene

2008-01-11 Thread Shai Erera
Why not use the SnowballAnalyzer for Turkish? Snowball recently added a Turkish stemmer. On Jan 10, 2008 8:51 PM, Grant Ingersoll (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/LUCENE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Grant

Re: Javadocs and Nightly Builds

2008-01-11 Thread Grant Ingersoll
It is svn exported from a crontab on p.a.o under my account. If you can tell me what commands need to be run, I can add it. Presumably the new version of Hudson has the ability to do scp now, but I haven't had a chance to tackle that yet. At some point, we should tackle that, which will

[jira] Commented: (LUCENE-1126) Simplify StandardTokenizer JFlex grammar

2008-01-11 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557988#action_12557988 ] Steven Rowe commented on LUCENE-1126: - In part my imprecise characterization of the

[jira] Resolved: (LUCENE-82) [PATCH] HTMLParser: IOException: Pipe closed

2008-01-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-82. --- Resolution: Won't Fix Assignee: (was: Lucene Developers) Seems like this issue

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557999#action_12557999 ] Eks Dev commented on LUCENE-584: it looks like ChainedFilter could become obsolete if

[jira] Commented: (LUCENE-1126) Simplify StandardTokenizer JFlex grammar

2008-01-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557996#action_12557996 ] Grant Ingersoll commented on LUCENE-1126: - {quote} 'm not positive, but couldn't

[jira] Updated: (LUCENE-1127) TokenSources.getTokenStream(Document...)

2008-01-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1127: Attachment: LUCENE-1127.patch Patch applies from the contrib/highlighter directory.

[jira] Created: (LUCENE-1127) TokenSources.getTokenStream(Document...)

2008-01-11 Thread Grant Ingersoll (JIRA)
TokenSources.getTokenStream(Document...) - Key: LUCENE-1127 URL: https://issues.apache.org/jira/browse/LUCENE-1127 Project: Lucene - Java Issue Type: Improvement Components: contrib/*

Re: Build failed in Hudson: Lucene-Nightly #334

2008-01-11 Thread Nigel Daley
I had to kill the nightly Lucene build. It had hung for over 10 hours trying to run hudson 17572 0.0 0.35851235460 ?S 07:58:15 0:13 /export/ home/hudson/tools/java/jdk1.6.0_03/bin/javadoc -d /export/home/hudson/ hudson/jobs/Lucene-Nightly/workspace/trunk/build/docs/api/contrib-

Build failed in Hudson: Lucene-Nightly #334

2008-01-11 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/334/changes Changes: [buschmi] Add separate core, demo, and contrib javadocs to binary releases [buschmi] Rename README files to uppercase letters [buschmi] Rename README files to uppercase letters [buschmi] Include README*

[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: ContribQueries20080111.patch I tried to move contrib from Filter.bits() to

[jira] Created: (LUCENE-1128) Add Highlighting benchmark support to contrib/benchmark

2008-01-11 Thread Grant Ingersoll (JIRA)
Add Highlighting benchmark support to contrib/benchmark --- Key: LUCENE-1128 URL: https://issues.apache.org/jira/browse/LUCENE-1128 Project: Lucene - Java Issue Type: Improvement

Re: Lucene 2.3 RC2 available

2008-01-11 Thread Michael Busch
Good idea! I just sent a mail to java-user. -Michael Michael McCandless wrote: Michael, Do you think we should include java-user in this pre-release testing? It seems like it can only help us to ferret out issues, and, keeps our users informed/excited about an upcoming release? Mike

Re: [jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes

2008-01-11 Thread Michael Busch
Cool! Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557953#action_12557953 ] Michael McCandless commented on LUCENE-510:

[jira] Updated: (LUCENE-1128) Add Highlighting benchmark support to contrib/benchmark

2008-01-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1128: Attachment: LUCENE-1128.patch First draft of adding highlighter support to benchmarker.

[jira] Created: (LUCENE-1130) Hitting disk full during DocumentWriter.ThreadState.init(...) can cause hang

2008-01-11 Thread Michael McCandless (JIRA)
Hitting disk full during DocumentWriter.ThreadState.init(...) can cause hang Key: LUCENE-1130 URL: https://issues.apache.org/jira/browse/LUCENE-1130 Project: Lucene - Java

[jira] Commented: (LUCENE-1130) Hitting disk full during DocumentWriter.ThreadState.init(...) can cause hang

2008-01-11 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558054#action_12558054 ] Michael Busch commented on LUCENE-1130: --- {quote} and I think we should also push

Re: Lucene 2.3 RC2 available

2008-01-11 Thread Michael McCandless
Hmm ... I'll try to repro this and track it down. Mike Steven A Rowe wrote: Hi Michael, On 01/11/2008 at 3:34 AM, Michael Busch wrote: I just uploaded Lucene 2.3 release candidate 2 to http://people.apache.org/~buschmi/staging_area/lucene_2_3/rc2. Please switch to RC2 and keep testing!

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558076#action_12558076 ] Paul Elschot commented on LUCENE-584: - {quote} it looks like ChainedFilter could become

Add numDeletedDocs() to IndexReader

2008-01-11 Thread Shai Erera
Hi guys, I had a need to know how many deleted documents are in the index. I noticed there isn't an API for it in IndexReader, however the information can be obtained by calling IndexReader.maxDoc() - IndexReader.numDocs(). Do you think it's worth adding such an API to IndexReader? Cheers, Shai

RE: Lucene 2.3 RC2 available

2008-01-11 Thread Steven A Rowe
Hi Michael, On 01/11/2008 at 3:34 AM, Michael Busch wrote: I just uploaded Lucene 2.3 release candidate 2 to http://people.apache.org/~buschmi/staging_area/lucene_2_3/rc2. Please switch to RC2 and keep testing! (The report below is not about binary release testing, but rather running the

[jira] Commented: (LUCENE-1128) Add Highlighting benchmark support to contrib/benchmark

2008-01-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558053#action_12558053 ] Grant Ingersoll commented on LUCENE-1128: - Note, this patch also assumes

[jira] Created: (LUCENE-1129) ReadTask ignores traversalSize

2008-01-11 Thread Grant Ingersoll (JIRA)
ReadTask ignores traversalSize -- Key: LUCENE-1129 URL: https://issues.apache.org/jira/browse/LUCENE-1129 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Grant

Re: Javadocs and Nightly Builds

2008-01-11 Thread Michael Busch
Grant Ingersoll wrote: It is svn exported from a crontab on p.a.o under my account. If you can tell me what commands need to be run, I can add it. You just need to svn co the docs from the new location https://svn.apache.org/repos/asf/lucene/java/site/docs and scp it to

[jira] Updated: (LUCENE-1130) Hitting disk full during DocumentWriter.ThreadState.init(...) can cause hang

2008-01-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1130: --- Attachment: LUCENE-1130.patch I created two test cases that show the issue patch

[jira] Commented: (LUCENE-1131) Add numDeletedDocs to IndexReader

2008-01-11 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558113#action_12558113 ] Shai Erera commented on LUCENE-1131: This is an option, however it will result in two

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558115#action_12558115 ] Eks Dev commented on LUCENE-584: hmm, in order to have fast and/or operations we need to

[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Test20080111.patch) Decouple Filter from BitSet

[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: Test20080111.patch I moved Filter forward by removing the deprecated bits() method and

[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: Test20080111.patch Upload once more, this time with licence. Decouple Filter from

[jira] Commented: (LUCENE-1128) Add Highlighting benchmark support to contrib/benchmark

2008-01-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558112#action_12558112 ] Grant Ingersoll commented on LUCENE-1128: - Patch does not work with term.vectors =

[jira] Commented: (LUCENE-1131) Add numDeletedDocs to IndexReader

2008-01-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558111#action_12558111 ] Yonik Seeley commented on LUCENE-1131: -- How about just using maxDoc() - numDocs()?

[jira] Commented: (LUCENE-1129) ReadTask ignores traversalSize

2008-01-11 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558104#action_12558104 ] Doron Cohen commented on LUCENE-1129: - {quote} so we just need to use this value in

RE: Lucene 2.3 RC2 available

2008-01-11 Thread Steven A Rowe
Hi Mike, I couldn't get the patch to apply (word wrapping/line endings/whatever), so I just manually pasted in the added line after deleting the corresponding removed line. With the patched version, I got 0 failures out of 20 runs. After I reverted back to the original version, I got 10

[jira] Updated: (LUCENE-1128) Add Highlighting benchmark support to contrib/benchmark

2008-01-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1128: Attachment: LUCENE-1128.patch Task now requires docs to be stored (either for analyzing

Re: Javadocs and Nightly Builds

2008-01-11 Thread Grant Ingersoll
On Jan 11, 2008, at 1:11 PM, Michael Busch wrote: Grant Ingersoll wrote: It is svn exported from a crontab on p.a.o under my account. If you can tell me what commands need to be run, I can add it. You just need to svn co the docs from the new location

[jira] Created: (LUCENE-1131) Add numDeletedDocs to IndexReader

2008-01-11 Thread Shai Erera (JIRA)
Add numDeletedDocs to IndexReader - Key: LUCENE-1131 URL: https://issues.apache.org/jira/browse/LUCENE-1131 Project: Lucene - Java Issue Type: New Feature Reporter: Shai Erera

Re: Add numDeletedDocs() to IndexReader

2008-01-11 Thread Shai Erera
Interesting you mention numDeletedDocs w.r.t. optimize - I need that information for exactly the same reason. Is there any good rule of thumb of knowing when it's best to call optimize? I know that during the internal merges Lucene does, deleted docs are removed. However there are those large

Re: Add numDeletedDocs() to IndexReader

2008-01-11 Thread Otis Gospodnetic
I think that's useful (for knowing when it's time to optimize), though I thought I added something like that a long time ago... maybe on some local version... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Shai Erera [EMAIL PROTECTED] To:

Re: Javadocs and Nightly Builds

2008-01-11 Thread Michael Busch
Grant Ingersoll wrote: Done. For the record, the script runs: Cool, thanks! /usr/local/bin/svn export --force http://svn.apache.org/repos/asf/lucene/java/site/docs /www/lucene.apache.org/java/docs Another question: Do we also want to have a link to the unreleased documentation on

Re: Lucene 2.3 RC2 available

2008-01-11 Thread Michael McCandless
Alas, so far I cannot repro this. But I did see one off-by-one error in an assert the test. Steve, could you try applying this patch and see if the failure still happens? Thanks: Index: src/test/org/apache/lucene/index/TestDeletionPolicy.java

Re: Add numDeletedDocs() to IndexReader

2008-01-11 Thread Shai Erera
If that's the case, then I can open an issue and create the appropriate patch. On Jan 11, 2008 9:43 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I think that's useful (for knowing when it's time to optimize), though I thought I added something like that a long time ago... maybe on some local

[jira] Commented: (LUCENE-644) Contrib: another highlighter approach

2008-01-11 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558066#action_12558066 ] Mark Miller commented on LUCENE-644: Yes it is still an issue. Its been a while since

Re: Lucene 2.3 RC2 available

2008-01-11 Thread Michael McCandless
OK, phew :) I'll commit to trunk. Michael, is it OK to commit to 2.3 too? Mike On Jan 11, 2008, at 3:27 PM, Steven A Rowe wrote: Hi Mike, I couldn't get the patch to apply (word wrapping/line endings/ whatever), so I just manually pasted in the added line after deleting the

Re: Weight/Strength/Ranking of entered query

2008-01-11 Thread Chris Hostetter
http://people.apache.org/~hossman/#java-dev Please use java-user not java-dev Your question is better suited for the [EMAIL PROTECTED] mailing list ... not the [EMAIL PROTECTED] list. java-dev is for discussing development of the internals of the Lucene Java library ... it is *not* the

[jira] Updated: (LUCENE-1131) Add numDeletedDocs to IndexReader

2008-01-11 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1131: --- Attachment: LUCENE-1131.patch A very simple patch that implements numDeletedDocs in all the

Build failed in Hudson: Lucene-Nightly #335

2008-01-11 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/335/changes Changes: [buschmi] Generate nightly documentation files. [mikemccand] fix off-by-one error [buschmi] Separate project's web site from version-specific documentation. --