[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-06-20 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects,
 and has been outstanding for 37 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-20062007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 1 min 18 secs
Command Line: /opt/jdk1.5/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=20062007 
-Djavacc.home=/usr/local/gump/packages/javacc-3.1 package 
[Working Directory: /usr/local/gump/public/workspace/lucene-java]
CLASSPATH: 
/opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/usr/local/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/usr/local/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/usr/local/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-20062007.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/junit/dist/junit-20062007.jar:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/usr/local/gump/public/workspa

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-06-20 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects,
 and has been outstanding for 37 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-20062007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 1 min 18 secs
Command Line: /opt/jdk1.5/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=20062007 
-Djavacc.home=/usr/local/gump/packages/javacc-3.1 package 
[Working Directory: /usr/local/gump/public/workspace/lucene-java]
CLASSPATH: 
/opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/usr/local/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/usr/local/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/usr/local/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-20062007.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/junit/dist/junit-20062007.jar:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/usr/local/gump/public/workspa

[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Steven Parkes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506576
 ] 

Steven Parkes commented on LUCENE-843:
--

I've started looking at this, what it would take to merge with the merge policy 
stuff (LUCENE-847). Noticed that there are a couple of test failures?

> improve how IndexWriter uses RAM to buffer added documents
> --
>
> Key: LUCENE-843
> URL: https://issues.apache.org/jira/browse/LUCENE-843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.2
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, 
> LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, 
> LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, 
> LUCENE-843.take9.patch
>
>
> I'm working on a new class (MultiDocumentWriter) that writes more than
> one document directly into a single Lucene segment, more efficiently
> than the current approach.
> This only affects the creation of an initial segment from added
> documents.  I haven't changed anything after that, eg how segments are
> merged.
> The basic ideas are:
>   * Write stored fields and term vectors directly to disk (don't
> use up RAM for these).
>   * Gather posting lists & term infos in RAM, but periodically do
> in-RAM merges.  Once RAM is full, flush buffers to disk (and
> merge them later when it's time to make a real segment).
>   * Recycle objects/buffers to reduce time/stress in GC.
>   * Other various optimizations.
> Some of these changes are similar to how KinoSearch builds a segment.
> But, I haven't made any changes to Lucene's file format nor added
> requirements for a global fields schema.
> So far the only externally visible change is a new method
> "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
> deprecated) so that it flushes according to RAM usage and not a fixed
> number documents added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-843:
--

Attachment: index.presharedstores.nocfs.zip
index.presharedstores.cfs.zip

Oh, were the test failures only in the TestBackwardsCompatibility?

Because I changed the index file format, I added 2 more ZIP files to
that unit test, but, "svn diff" doesn't pick up the new zip files.  So
I'm attaching them.  Can you pull off these zip files into your
src/test/org/apache/lucene/index and test again?  Thanks.



> improve how IndexWriter uses RAM to buffer added documents
> --
>
> Key: LUCENE-843
> URL: https://issues.apache.org/jira/browse/LUCENE-843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.2
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: index.presharedstores.cfs.zip, 
> index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, 
> LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, 
> LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, 
> LUCENE-843.take9.patch
>
>
> I'm working on a new class (MultiDocumentWriter) that writes more than
> one document directly into a single Lucene segment, more efficiently
> than the current approach.
> This only affects the creation of an initial segment from added
> documents.  I haven't changed anything after that, eg how segments are
> merged.
> The basic ideas are:
>   * Write stored fields and term vectors directly to disk (don't
> use up RAM for these).
>   * Gather posting lists & term infos in RAM, but periodically do
> in-RAM merges.  Once RAM is full, flush buffers to disk (and
> merge them later when it's time to make a real segment).
>   * Recycle objects/buffers to reduce time/stress in GC.
>   * Other various optimizations.
> Some of these changes are similar to how KinoSearch builds a segment.
> But, I haven't made any changes to Lucene's file format nor added
> requirements for a global fields schema.
> So far the only externally visible change is a new method
> "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
> deprecated) so that it flushes according to RAM usage and not a fixed
> number documents added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2.0 release available

2007-06-20 Thread Michael McCandless

One small change that I think we should make under "Lucene News" on
the web site is to change the name "Point-in-time searching" (this is
LUCENE-710 right?).

Lucene has always had this feature.  It's just that the implementation
previously relied on the specifics of how the underlying filesystem
handles deletion of open files.  Windows and UNIX have the "right"
semantics but NFS (and maybe others) doesn't.

LUCENE-710 just enables you to make your own "custom deletion policy"
which would then allow an application to do "point in time" searching
over NFS, live backups, etc.

Maybe we should change this to "point in time searching over NFS" or
"custom index deletion policies" instead?

Mike

"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On 6/19/07, DM Smith <[EMAIL PROTECTED]> wrote:
> > FYI, The announcement has not made it to the http://
> > lucene.apache.org/ page.
> 
> I just committed this.  It should be viewable in about an hour.
> 
> Note: I had to change the syntax slightly... I'm using forrest-0.8
> now, and apparently it doesn't allow  inside 
> 
> -Yonik
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Steven Parkes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506609
 ] 

Steven Parkes commented on LUCENE-843:
--

Yeah, that was it.

I'll be delving more into the code as I try to figure out how it will dove tail 
with the merge policy factoring.

> improve how IndexWriter uses RAM to buffer added documents
> --
>
> Key: LUCENE-843
> URL: https://issues.apache.org/jira/browse/LUCENE-843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.2
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: index.presharedstores.cfs.zip, 
> index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, 
> LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, 
> LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, 
> LUCENE-843.take9.patch
>
>
> I'm working on a new class (MultiDocumentWriter) that writes more than
> one document directly into a single Lucene segment, more efficiently
> than the current approach.
> This only affects the creation of an initial segment from added
> documents.  I haven't changed anything after that, eg how segments are
> merged.
> The basic ideas are:
>   * Write stored fields and term vectors directly to disk (don't
> use up RAM for these).
>   * Gather posting lists & term infos in RAM, but periodically do
> in-RAM merges.  Once RAM is full, flush buffers to disk (and
> merge them later when it's time to make a real segment).
>   * Recycle objects/buffers to reduce time/stress in GC.
>   * Other various optimizations.
> Some of these changes are similar to how KinoSearch builds a segment.
> But, I haven't made any changes to Lucene's file format nor added
> requirements for a global fields schema.
> So far the only externally visible change is a new method
> "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
> deprecated) so that it flushes according to RAM usage and not a fixed
> number documents added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

2007-06-20 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-933:
--

Assignee: Doron Cohen

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no 
> tokens for input
> ---
>
> Key: LUCENE-933
> URL: https://issues.apache.org/jira/browse/LUCENE-933
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Doron Cohen
>
> as triggered by SOLR-261, if you have a query like this...
>+foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" 
> portions of the query (posisbly because they are stop words) the resulting 
> query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty 
> BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing 
> encountering parens whose contents result in an empty BooleanQuery -- but 
> what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends 
> itself to dropping the sub-BooleanQuery completely.  situation (c) may also 
> lend itself to that solution, since semanticly it means "don't allow a match 
> on any queries in the empty set of queries".   I have no idea what the 
> "right" thing to do for situation (a) is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2.0 release available

2007-06-20 Thread Daniel Naber
On Wednesday 20 June 2007 03:01, Yonik Seeley wrote:

> > FYI, The announcement has not made it to the http://
> > lucene.apache.org/ page.
>
> I just committed this.  It should be viewable in about an hour.

The links to the new features don't work for me, I always end up on the API 
overview page. Shouldn't the links be e.g.

http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html

instead of

http://lucene.apache.org/java/2_2_0/api/index.html?org/apache/lucene/document/Field.html
?

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

2007-06-20 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506703
 ] 

Doron Cohen commented on LUCENE-933:


So an acceptable solution is:
  Query parser will ignore empty clauses (e.g. ' ( ) ' ) resulted from words 
filtering, the same as it already does for single words. 

A straightforward fix is for QueryParser to avoid adding null (inner) queries 
into (outer) clauses sets. (It makes sense, too.)

However this has a side effect: 
  For queries that became "empty" as result of filtering (stopping), 
QueryParser would now return null. 

This is an API semantics change, because applications that used to get a 
BooleanQuery with 0 clauses as parse result, would now get a null query. 

Here is a closer look on the behavior change:

Original behavior:
   (1)  parse(" ")  == ParseException
   (2)  parse("( )")  == ParseException
   (3)  parse("stop") == " "
(actually a boolean query with 0 clauses)
   (4)  parse("(stop)")  == " "
(actually a boolean query with 0 clauses)
   (5)  parse("a stop b") == "a b"
   (6)  parse("a (stop) b") == "a () b"   
(middle part is a boolean query with 0 clauses)
   (7)  parse("a ((stop)) b") == "a () b" 
(again middle part is a boolean query with 0 clauses)

Modified behavior:   
   (3)  parse("stop") == null
   (4)  parse("(stop)")  == null
   (6)  parse("a (stop) b") == "a b"   
   (7)  parse("a ((stop)) b") == "a b" 

I think the modified behavior is the right one - applications can test a query 
for being null and realize that it is a no-op. 

However backwards compatibility is important - would this change break existing 
applications with annoying new NPEs?

As an alternative, QueryParser parse() methods can be modified to return a 
phony empty BQ instead of returning null, for the sake of backwards 
compatibility.

Thoughts?

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no 
> tokens for input
> ---
>
> Key: LUCENE-933
> URL: https://issues.apache.org/jira/browse/LUCENE-933
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Doron Cohen
>
> as triggered by SOLR-261, if you have a query like this...
>+foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" 
> portions of the query (posisbly because they are stop words) the resulting 
> query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty 
> BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing 
> encountering parens whose contents result in an empty BooleanQuery -- but 
> what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends 
> itself to dropping the sub-BooleanQuery completely.  situation (c) may also 
> lend itself to that solution, since semanticly it means "don't allow a match 
> on any queries in the empty set of queries".   I have no idea what the 
> "right" thing to do for situation (a) is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506718
 ] 

Michael McCandless commented on LUCENE-843:
---

> Yeah, that was it.

Phew!

> I'll be delving more into the code as I try to figure out how it will
> dove tail with the merge policy factoring.

OK, thanks.  I am very eager to get some other eyeballs looking for
issues with this patch!

I *think* this patch and the merge policy refactoring should be fairly
separate.

With this patch, "flushing" RAM -> Lucene segment is no longer a
"mergeSegments" call which I think simplifies IndexWriter.  Previously
mergeSegments had lots of extra logic to tell if it was merging RAM
segments (= a flush) vs merging "real" segments but now it's simpler
because mergeSegments really only merges segments.


> improve how IndexWriter uses RAM to buffer added documents
> --
>
> Key: LUCENE-843
> URL: https://issues.apache.org/jira/browse/LUCENE-843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.2
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: index.presharedstores.cfs.zip, 
> index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, 
> LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, 
> LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, 
> LUCENE-843.take9.patch
>
>
> I'm working on a new class (MultiDocumentWriter) that writes more than
> one document directly into a single Lucene segment, more efficiently
> than the current approach.
> This only affects the creation of an initial segment from added
> documents.  I haven't changed anything after that, eg how segments are
> merged.
> The basic ideas are:
>   * Write stored fields and term vectors directly to disk (don't
> use up RAM for these).
>   * Gather posting lists & term infos in RAM, but periodically do
> in-RAM merges.  Once RAM is full, flush buffers to disk (and
> merge them later when it's time to make a real segment).
>   * Recycle objects/buffers to reduce time/stress in GC.
>   * Other various optimizations.
> Some of these changes are similar to how KinoSearch builds a segment.
> But, I haven't made any changes to Lucene's file format nor added
> requirements for a global fields schema.
> So far the only externally visible change is a new method
> "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
> deprecated) so that it flushes according to RAM usage and not a fixed
> number documents added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2.0 release available

2007-06-20 Thread Michael Busch

Michael McCandless wrote:

Maybe we should change this to "point in time searching over NFS" or
"custom index deletion policies" instead?

  


Thanks for the feedback, Mike! I agree, "point-in-time searching over 
NFS" describes the new

addition more accurately. I will change the news entry.

- Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2.0 release available

2007-06-20 Thread Michael Busch

Daniel Naber wrote:

On Wednesday 20 June 2007 03:01, Yonik Seeley wrote:

  
The links to the new features don't work for me, I always end up on the API 
overview page. Shouldn't the links be e.g.


http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html

instead of

http://lucene.apache.org/java/2_2_0/api/index.html?org/apache/lucene/document/Field.html
?

  


Hi Daniel,

that's strange. The links work for me in both Firefox and IE.

Anyway, I will change the links as you suggest to point to the no-frames 
version of the

javadocs. Then those links shouldn't cause problems anymore. Thanks!

- Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

2007-06-20 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-933:
---

Attachment: lucene-933_nullify.patch
lucene-933_backwards_comapatible.patch

Ok attaching two different fixes (as discussed above) 
  (1)  lucene-933_backwards_comapatible.patch 
  (2)  lucene-933_nullify.patch

All tests pass with either of these.

The "nullify" approach requires more changes especially tests as well as in 
MemoryIndex, so, after while fixing as required for tests to pass in this 
(nullifying) approach I cane to conclusion that it is better to continue to not 
return null queries as result of parsing, otherwise there'll be lots of 
"noise". 

So I would like to commit patch (1) - unless someone points a problem that I 
missed.

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no 
> tokens for input
> ---
>
> Key: LUCENE-933
> URL: https://issues.apache.org/jira/browse/LUCENE-933
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Doron Cohen
> Attachments: lucene-933_backwards_comapatible.patch, 
> lucene-933_nullify.patch
>
>
> as triggered by SOLR-261, if you have a query like this...
>+foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" 
> portions of the query (posisbly because they are stop words) the resulting 
> query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty 
> BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing 
> encountering parens whose contents result in an empty BooleanQuery -- but 
> what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends 
> itself to dropping the sub-BooleanQuery completely.  situation (c) may also 
> lend itself to that solution, since semanticly it means "don't allow a match 
> on any queries in the empty set of queries".   I have no idea what the 
> "right" thing to do for situation (a) is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-936) Typo on query parser syntax web page.

2007-06-20 Thread Ken Ganong (JIRA)
Typo on query parser syntax web page.
-

 Key: LUCENE-936
 URL: https://issues.apache.org/jira/browse/LUCENE-936
 Project: Lucene - Java
  Issue Type: Bug
  Components: Website
Affects Versions: 2.2
Reporter: Ken Ganong
Priority: Trivial


On the web page 
http://lucene.apache.org/java/docs/queryparsersyntax.html#N10126 the text says:

"To search for documents that must contain "jakarta" and may contain "lucene" 
use the query:"

The example says:

+jakarta apache

The problem:
The example uses apache where the text says lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506752
 ] 

Michael Busch commented on LUCENE-843:
--

Hi Mike,

my first comment on this patch is: Impressive!

It's also quite overwhelming at the beginning, but I'm trying to dig into it. 
I'll probably have more questions, here's the first one:

Does DocumentsWriter also solve the problem DocumentWriter had before 
LUCENE-880? I believe the answer is yes. Even though you close the TokenStreams 
in the finally clause of invertField() like DocumentWriter did before 880 this 
is safe, because addPosition() serializes the term strings and payload bytes 
into the posting hash table right away. Is that right?

> improve how IndexWriter uses RAM to buffer added documents
> --
>
> Key: LUCENE-843
> URL: https://issues.apache.org/jira/browse/LUCENE-843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.2
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: index.presharedstores.cfs.zip, 
> index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, 
> LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, 
> LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, 
> LUCENE-843.take9.patch
>
>
> I'm working on a new class (MultiDocumentWriter) that writes more than
> one document directly into a single Lucene segment, more efficiently
> than the current approach.
> This only affects the creation of an initial segment from added
> documents.  I haven't changed anything after that, eg how segments are
> merged.
> The basic ideas are:
>   * Write stored fields and term vectors directly to disk (don't
> use up RAM for these).
>   * Gather posting lists & term infos in RAM, but periodically do
> in-RAM merges.  Once RAM is full, flush buffers to disk (and
> merge them later when it's time to make a real segment).
>   * Recycle objects/buffers to reduce time/stress in GC.
>   * Other various optimizations.
> Some of these changes are similar to how KinoSearch builds a segment.
> But, I haven't made any changes to Lucene's file format nor added
> requirements for a global fields schema.
> So far the only externally visible change is a new method
> "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
> deprecated) so that it flushes according to RAM usage and not a fixed
> number documents added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506778
 ] 

Michael Busch commented on LUCENE-843:
--

Mike,

the benchmarks you run focus on measuring the pure indexing performance. I 
think it would be interesting to know how big the speedup is in real-life 
scenarios, i. e. with StandardAnalyzer and maybe even HTML parsing? For sure 
the speedup will be less, but it should still be a significant improvement. Did 
you run those kinds of benchmarks already?

> improve how IndexWriter uses RAM to buffer added documents
> --
>
> Key: LUCENE-843
> URL: https://issues.apache.org/jira/browse/LUCENE-843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.2
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: index.presharedstores.cfs.zip, 
> index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, 
> LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, 
> LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, 
> LUCENE-843.take9.patch
>
>
> I'm working on a new class (MultiDocumentWriter) that writes more than
> one document directly into a single Lucene segment, more efficiently
> than the current approach.
> This only affects the creation of an initial segment from added
> documents.  I haven't changed anything after that, eg how segments are
> merged.
> The basic ideas are:
>   * Write stored fields and term vectors directly to disk (don't
> use up RAM for these).
>   * Gather posting lists & term infos in RAM, but periodically do
> in-RAM merges.  Once RAM is full, flush buffers to disk (and
> merge them later when it's time to make a real segment).
>   * Recycle objects/buffers to reduce time/stress in GC.
>   * Other various optimizations.
> Some of these changes are similar to how KinoSearch builds a segment.
> But, I haven't made any changes to Lucene's file format nor added
> requirements for a global fields schema.
> So far the only externally visible change is a new method
> "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
> deprecated) so that it flushes according to RAM usage and not a fixed
> number documents added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]