[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects 3 projects, and has been outstanding for 37 runs. The current state of this project is 'Failed', with reason 'Build Failed'. For reference only, the following projects are affected by this: - eyebrowse : Web-based mail archive browsing - jakarta-lucene : Java Based Search Engine - lucene-java : Java Based Search Engine Full details are available at: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html That said, some information snippets are provided here. The following annotations (debug/informational/warning/error messages) were provided: -DEBUG- Sole output [lucene-core-20062007.jar] identifier set to project name -DEBUG- Dependency on javacc exists, no need to add for property javacc.home. -INFO- Failed with reason build failed -INFO- Failed to extract fallback artifacts from Gump Repository The following work was performed: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html Work Name: build_lucene-java_lucene-java (Type: Build) Work ended in a state of : Failed Elapsed: 1 min 18 secs Command Line: /opt/jdk1.5/bin/java -Djava.awt.headless=true -Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml -Dbuild.sysclasspath=only -Dversion=20062007 -Djavacc.home=/usr/local/gump/packages/javacc-3.1 package [Working Directory: /usr/local/gump/public/workspace/lucene-java] CLASSPATH: /opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/usr/local/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/usr/local/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/usr/local/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-20062007.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/junit/dist/junit-20062007.jar:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/usr/local/gump/public/workspa
[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects 3 projects, and has been outstanding for 37 runs. The current state of this project is 'Failed', with reason 'Build Failed'. For reference only, the following projects are affected by this: - eyebrowse : Web-based mail archive browsing - jakarta-lucene : Java Based Search Engine - lucene-java : Java Based Search Engine Full details are available at: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html That said, some information snippets are provided here. The following annotations (debug/informational/warning/error messages) were provided: -DEBUG- Sole output [lucene-core-20062007.jar] identifier set to project name -DEBUG- Dependency on javacc exists, no need to add for property javacc.home. -INFO- Failed with reason build failed -INFO- Failed to extract fallback artifacts from Gump Repository The following work was performed: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html Work Name: build_lucene-java_lucene-java (Type: Build) Work ended in a state of : Failed Elapsed: 1 min 18 secs Command Line: /opt/jdk1.5/bin/java -Djava.awt.headless=true -Xbootclasspath/p:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/usr/local/gump/public/workspace/xml-xerces2/build/xercesImpl.jar org.apache.tools.ant.Main -Dgump.merge=/x1/gump/public/gump/work/merge.xml -Dbuild.sysclasspath=only -Dversion=20062007 -Djavacc.home=/usr/local/gump/packages/javacc-3.1 package [Working Directory: /usr/local/gump/public/workspace/lucene-java] CLASSPATH: /opt/jdk1.5/lib/tools.jar:/usr/local/gump/public/workspace/lucene-java/build/classes/java:/usr/local/gump/public/workspace/lucene-java/build/classes/demo:/usr/local/gump/public/workspace/lucene-java/build/classes/test:/usr/local/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/usr/local/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/usr/local/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/usr/local/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/usr/local/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-swing.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-trax.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-junit.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/usr/local/gump/public/workspace/ant/dist/lib/ant.jar:/usr/local/gump/packages/junit3.8.1/junit.jar:/usr/local/gump/public/workspace/xml-commons/java/build/resolver.jar:/usr/local/gump/packages/je-1.7.1/lib/je.jar:/usr/local/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/usr/local/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-20062007.jar:/usr/local/gump/packages/javacc-3.1/bin/lib/javacc.jar:/usr/local/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/usr/local/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/usr/local/gump/public/workspace/junit/dist/junit-20062007.jar:/usr/local/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/usr/local/gump/public/workspa
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506576 ] Steven Parkes commented on LUCENE-843: -- I've started looking at this, what it would take to merge with the merge policy stuff (LUCENE-847). Noticed that there are a couple of test failures? > improve how IndexWriter uses RAM to buffer added documents > -- > > Key: LUCENE-843 > URL: https://issues.apache.org/jira/browse/LUCENE-843 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.2 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, > LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, > LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, > LUCENE-843.take9.patch > > > I'm working on a new class (MultiDocumentWriter) that writes more than > one document directly into a single Lucene segment, more efficiently > than the current approach. > This only affects the creation of an initial segment from added > documents. I haven't changed anything after that, eg how segments are > merged. > The basic ideas are: > * Write stored fields and term vectors directly to disk (don't > use up RAM for these). > * Gather posting lists & term infos in RAM, but periodically do > in-RAM merges. Once RAM is full, flush buffers to disk (and > merge them later when it's time to make a real segment). > * Recycle objects/buffers to reduce time/stress in GC. > * Other various optimizations. > Some of these changes are similar to how KinoSearch builds a segment. > But, I haven't made any changes to Lucene's file format nor added > requirements for a global fields schema. > So far the only externally visible change is a new method > "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is > deprecated) so that it flushes according to RAM usage and not a fixed > number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: index.presharedstores.nocfs.zip index.presharedstores.cfs.zip Oh, were the test failures only in the TestBackwardsCompatibility? Because I changed the index file format, I added 2 more ZIP files to that unit test, but, "svn diff" doesn't pick up the new zip files. So I'm attaching them. Can you pull off these zip files into your src/test/org/apache/lucene/index and test again? Thanks. > improve how IndexWriter uses RAM to buffer added documents > -- > > Key: LUCENE-843 > URL: https://issues.apache.org/jira/browse/LUCENE-843 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.2 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Attachments: index.presharedstores.cfs.zip, > index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, > LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, > LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, > LUCENE-843.take9.patch > > > I'm working on a new class (MultiDocumentWriter) that writes more than > one document directly into a single Lucene segment, more efficiently > than the current approach. > This only affects the creation of an initial segment from added > documents. I haven't changed anything after that, eg how segments are > merged. > The basic ideas are: > * Write stored fields and term vectors directly to disk (don't > use up RAM for these). > * Gather posting lists & term infos in RAM, but periodically do > in-RAM merges. Once RAM is full, flush buffers to disk (and > merge them later when it's time to make a real segment). > * Recycle objects/buffers to reduce time/stress in GC. > * Other various optimizations. > Some of these changes are similar to how KinoSearch builds a segment. > But, I haven't made any changes to Lucene's file format nor added > requirements for a global fields schema. > So far the only externally visible change is a new method > "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is > deprecated) so that it flushes according to RAM usage and not a fixed > number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2.0 release available
One small change that I think we should make under "Lucene News" on the web site is to change the name "Point-in-time searching" (this is LUCENE-710 right?). Lucene has always had this feature. It's just that the implementation previously relied on the specifics of how the underlying filesystem handles deletion of open files. Windows and UNIX have the "right" semantics but NFS (and maybe others) doesn't. LUCENE-710 just enables you to make your own "custom deletion policy" which would then allow an application to do "point in time" searching over NFS, live backups, etc. Maybe we should change this to "point in time searching over NFS" or "custom index deletion policies" instead? Mike "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On 6/19/07, DM Smith <[EMAIL PROTECTED]> wrote: > > FYI, The announcement has not made it to the http:// > > lucene.apache.org/ page. > > I just committed this. It should be viewable in about an hour. > > Note: I had to change the syntax slightly... I'm using forrest-0.8 > now, and apparently it doesn't allow inside > > -Yonik > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506609 ] Steven Parkes commented on LUCENE-843: -- Yeah, that was it. I'll be delving more into the code as I try to figure out how it will dove tail with the merge policy factoring. > improve how IndexWriter uses RAM to buffer added documents > -- > > Key: LUCENE-843 > URL: https://issues.apache.org/jira/browse/LUCENE-843 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.2 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Attachments: index.presharedstores.cfs.zip, > index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, > LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, > LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, > LUCENE-843.take9.patch > > > I'm working on a new class (MultiDocumentWriter) that writes more than > one document directly into a single Lucene segment, more efficiently > than the current approach. > This only affects the creation of an initial segment from added > documents. I haven't changed anything after that, eg how segments are > merged. > The basic ideas are: > * Write stored fields and term vectors directly to disk (don't > use up RAM for these). > * Gather posting lists & term infos in RAM, but periodically do > in-RAM merges. Once RAM is full, flush buffers to disk (and > merge them later when it's time to make a real segment). > * Recycle objects/buffers to reduce time/stress in GC. > * Other various optimizations. > Some of these changes are similar to how KinoSearch builds a segment. > But, I haven't made any changes to Lucene's file format nor added > requirements for a global fields schema. > So far the only externally visible change is a new method > "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is > deprecated) so that it flushes according to RAM usage and not a fixed > number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
[ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-933: -- Assignee: Doron Cohen > QueryParser can produce empty sub BooleanQueries when Analyzer proudces no > tokens for input > --- > > Key: LUCENE-933 > URL: https://issues.apache.org/jira/browse/LUCENE-933 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Assignee: Doron Cohen > > as triggered by SOLR-261, if you have a query like this... >+foo:BBB +(yak:AAA baz:CCC) > ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" > portions of the query (posisbly because they are stop words) the resulting > query produced by the QueryParser will be... > +foo:BBB +() > ...that is a BooleanQuery with two required clauses, one of which is an empty > BooleanQuery with no clauses. > this does not appear to be "good" behavior. > In general, QueryParser should be smarter about what it does when parsing > encountering parens whose contents result in an empty BooleanQuery -- but > what exactly it should do in the following situations... > a) +foo:BBB +() > b) +foo:BBB () > c) +foo:BBB -() > ...is up for interpretation. I would think situation (b) clearly lends > itself to dropping the sub-BooleanQuery completely. situation (c) may also > lend itself to that solution, since semanticly it means "don't allow a match > on any queries in the empty set of queries". I have no idea what the > "right" thing to do for situation (a) is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2.0 release available
On Wednesday 20 June 2007 03:01, Yonik Seeley wrote: > > FYI, The announcement has not made it to the http:// > > lucene.apache.org/ page. > > I just committed this. It should be viewable in about an hour. The links to the new features don't work for me, I always end up on the API overview page. Shouldn't the links be e.g. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html instead of http://lucene.apache.org/java/2_2_0/api/index.html?org/apache/lucene/document/Field.html ? Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
[ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506703 ] Doron Cohen commented on LUCENE-933: So an acceptable solution is: Query parser will ignore empty clauses (e.g. ' ( ) ' ) resulted from words filtering, the same as it already does for single words. A straightforward fix is for QueryParser to avoid adding null (inner) queries into (outer) clauses sets. (It makes sense, too.) However this has a side effect: For queries that became "empty" as result of filtering (stopping), QueryParser would now return null. This is an API semantics change, because applications that used to get a BooleanQuery with 0 clauses as parse result, would now get a null query. Here is a closer look on the behavior change: Original behavior: (1) parse(" ") == ParseException (2) parse("( )") == ParseException (3) parse("stop") == " " (actually a boolean query with 0 clauses) (4) parse("(stop)") == " " (actually a boolean query with 0 clauses) (5) parse("a stop b") == "a b" (6) parse("a (stop) b") == "a () b" (middle part is a boolean query with 0 clauses) (7) parse("a ((stop)) b") == "a () b" (again middle part is a boolean query with 0 clauses) Modified behavior: (3) parse("stop") == null (4) parse("(stop)") == null (6) parse("a (stop) b") == "a b" (7) parse("a ((stop)) b") == "a b" I think the modified behavior is the right one - applications can test a query for being null and realize that it is a no-op. However backwards compatibility is important - would this change break existing applications with annoying new NPEs? As an alternative, QueryParser parse() methods can be modified to return a phony empty BQ instead of returning null, for the sake of backwards compatibility. Thoughts? > QueryParser can produce empty sub BooleanQueries when Analyzer proudces no > tokens for input > --- > > Key: LUCENE-933 > URL: https://issues.apache.org/jira/browse/LUCENE-933 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Assignee: Doron Cohen > > as triggered by SOLR-261, if you have a query like this... >+foo:BBB +(yak:AAA baz:CCC) > ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" > portions of the query (posisbly because they are stop words) the resulting > query produced by the QueryParser will be... > +foo:BBB +() > ...that is a BooleanQuery with two required clauses, one of which is an empty > BooleanQuery with no clauses. > this does not appear to be "good" behavior. > In general, QueryParser should be smarter about what it does when parsing > encountering parens whose contents result in an empty BooleanQuery -- but > what exactly it should do in the following situations... > a) +foo:BBB +() > b) +foo:BBB () > c) +foo:BBB -() > ...is up for interpretation. I would think situation (b) clearly lends > itself to dropping the sub-BooleanQuery completely. situation (c) may also > lend itself to that solution, since semanticly it means "don't allow a match > on any queries in the empty set of queries". I have no idea what the > "right" thing to do for situation (a) is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506718 ] Michael McCandless commented on LUCENE-843: --- > Yeah, that was it. Phew! > I'll be delving more into the code as I try to figure out how it will > dove tail with the merge policy factoring. OK, thanks. I am very eager to get some other eyeballs looking for issues with this patch! I *think* this patch and the merge policy refactoring should be fairly separate. With this patch, "flushing" RAM -> Lucene segment is no longer a "mergeSegments" call which I think simplifies IndexWriter. Previously mergeSegments had lots of extra logic to tell if it was merging RAM segments (= a flush) vs merging "real" segments but now it's simpler because mergeSegments really only merges segments. > improve how IndexWriter uses RAM to buffer added documents > -- > > Key: LUCENE-843 > URL: https://issues.apache.org/jira/browse/LUCENE-843 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.2 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Attachments: index.presharedstores.cfs.zip, > index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, > LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, > LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, > LUCENE-843.take9.patch > > > I'm working on a new class (MultiDocumentWriter) that writes more than > one document directly into a single Lucene segment, more efficiently > than the current approach. > This only affects the creation of an initial segment from added > documents. I haven't changed anything after that, eg how segments are > merged. > The basic ideas are: > * Write stored fields and term vectors directly to disk (don't > use up RAM for these). > * Gather posting lists & term infos in RAM, but periodically do > in-RAM merges. Once RAM is full, flush buffers to disk (and > merge them later when it's time to make a real segment). > * Recycle objects/buffers to reduce time/stress in GC. > * Other various optimizations. > Some of these changes are similar to how KinoSearch builds a segment. > But, I haven't made any changes to Lucene's file format nor added > requirements for a global fields schema. > So far the only externally visible change is a new method > "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is > deprecated) so that it flushes according to RAM usage and not a fixed > number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2.0 release available
Michael McCandless wrote: Maybe we should change this to "point in time searching over NFS" or "custom index deletion policies" instead? Thanks for the feedback, Mike! I agree, "point-in-time searching over NFS" describes the new addition more accurately. I will change the news entry. - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2.0 release available
Daniel Naber wrote: On Wednesday 20 June 2007 03:01, Yonik Seeley wrote: The links to the new features don't work for me, I always end up on the API overview page. Shouldn't the links be e.g. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html instead of http://lucene.apache.org/java/2_2_0/api/index.html?org/apache/lucene/document/Field.html ? Hi Daniel, that's strange. The links work for me in both Firefox and IE. Anyway, I will change the links as you suggest to point to the no-frames version of the javadocs. Then those links shouldn't cause problems anymore. Thanks! - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
[ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-933: --- Attachment: lucene-933_nullify.patch lucene-933_backwards_comapatible.patch Ok attaching two different fixes (as discussed above) (1) lucene-933_backwards_comapatible.patch (2) lucene-933_nullify.patch All tests pass with either of these. The "nullify" approach requires more changes especially tests as well as in MemoryIndex, so, after while fixing as required for tests to pass in this (nullifying) approach I cane to conclusion that it is better to continue to not return null queries as result of parsing, otherwise there'll be lots of "noise". So I would like to commit patch (1) - unless someone points a problem that I missed. > QueryParser can produce empty sub BooleanQueries when Analyzer proudces no > tokens for input > --- > > Key: LUCENE-933 > URL: https://issues.apache.org/jira/browse/LUCENE-933 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Assignee: Doron Cohen > Attachments: lucene-933_backwards_comapatible.patch, > lucene-933_nullify.patch > > > as triggered by SOLR-261, if you have a query like this... >+foo:BBB +(yak:AAA baz:CCC) > ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" > portions of the query (posisbly because they are stop words) the resulting > query produced by the QueryParser will be... > +foo:BBB +() > ...that is a BooleanQuery with two required clauses, one of which is an empty > BooleanQuery with no clauses. > this does not appear to be "good" behavior. > In general, QueryParser should be smarter about what it does when parsing > encountering parens whose contents result in an empty BooleanQuery -- but > what exactly it should do in the following situations... > a) +foo:BBB +() > b) +foo:BBB () > c) +foo:BBB -() > ...is up for interpretation. I would think situation (b) clearly lends > itself to dropping the sub-BooleanQuery completely. situation (c) may also > lend itself to that solution, since semanticly it means "don't allow a match > on any queries in the empty set of queries". I have no idea what the > "right" thing to do for situation (a) is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-936) Typo on query parser syntax web page.
Typo on query parser syntax web page. - Key: LUCENE-936 URL: https://issues.apache.org/jira/browse/LUCENE-936 Project: Lucene - Java Issue Type: Bug Components: Website Affects Versions: 2.2 Reporter: Ken Ganong Priority: Trivial On the web page http://lucene.apache.org/java/docs/queryparsersyntax.html#N10126 the text says: "To search for documents that must contain "jakarta" and may contain "lucene" use the query:" The example says: +jakarta apache The problem: The example uses apache where the text says lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506752 ] Michael Busch commented on LUCENE-843: -- Hi Mike, my first comment on this patch is: Impressive! It's also quite overwhelming at the beginning, but I'm trying to dig into it. I'll probably have more questions, here's the first one: Does DocumentsWriter also solve the problem DocumentWriter had before LUCENE-880? I believe the answer is yes. Even though you close the TokenStreams in the finally clause of invertField() like DocumentWriter did before 880 this is safe, because addPosition() serializes the term strings and payload bytes into the posting hash table right away. Is that right? > improve how IndexWriter uses RAM to buffer added documents > -- > > Key: LUCENE-843 > URL: https://issues.apache.org/jira/browse/LUCENE-843 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.2 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Attachments: index.presharedstores.cfs.zip, > index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, > LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, > LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, > LUCENE-843.take9.patch > > > I'm working on a new class (MultiDocumentWriter) that writes more than > one document directly into a single Lucene segment, more efficiently > than the current approach. > This only affects the creation of an initial segment from added > documents. I haven't changed anything after that, eg how segments are > merged. > The basic ideas are: > * Write stored fields and term vectors directly to disk (don't > use up RAM for these). > * Gather posting lists & term infos in RAM, but periodically do > in-RAM merges. Once RAM is full, flush buffers to disk (and > merge them later when it's time to make a real segment). > * Recycle objects/buffers to reduce time/stress in GC. > * Other various optimizations. > Some of these changes are similar to how KinoSearch builds a segment. > But, I haven't made any changes to Lucene's file format nor added > requirements for a global fields schema. > So far the only externally visible change is a new method > "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is > deprecated) so that it flushes according to RAM usage and not a fixed > number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506778 ] Michael Busch commented on LUCENE-843: -- Mike, the benchmarks you run focus on measuring the pure indexing performance. I think it would be interesting to know how big the speedup is in real-life scenarios, i. e. with StandardAnalyzer and maybe even HTML parsing? For sure the speedup will be less, but it should still be a significant improvement. Did you run those kinds of benchmarks already? > improve how IndexWriter uses RAM to buffer added documents > -- > > Key: LUCENE-843 > URL: https://issues.apache.org/jira/browse/LUCENE-843 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.2 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Attachments: index.presharedstores.cfs.zip, > index.presharedstores.nocfs.zip, LUCENE-843.patch, LUCENE-843.take2.patch, > LUCENE-843.take3.patch, LUCENE-843.take4.patch, LUCENE-843.take5.patch, > LUCENE-843.take6.patch, LUCENE-843.take7.patch, LUCENE-843.take8.patch, > LUCENE-843.take9.patch > > > I'm working on a new class (MultiDocumentWriter) that writes more than > one document directly into a single Lucene segment, more efficiently > than the current approach. > This only affects the creation of an initial segment from added > documents. I haven't changed anything after that, eg how segments are > merged. > The basic ideas are: > * Write stored fields and term vectors directly to disk (don't > use up RAM for these). > * Gather posting lists & term infos in RAM, but periodically do > in-RAM merges. Once RAM is full, flush buffers to disk (and > merge them later when it's time to make a real segment). > * Recycle objects/buffers to reduce time/stress in GC. > * Other various optimizations. > Some of these changes are similar to how KinoSearch builds a segment. > But, I haven't made any changes to Lucene's file format nor added > requirements for a global fields schema. > So far the only externally visible change is a new method > "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is > deprecated) so that it flushes according to RAM usage and not a fixed > number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]