[Lucene.Net] Fwd: Travel Assistance applications now open for ApacheCon NA 2011
-- Forwarded message -- From: Gavin McDonald ga...@16degrees.com.au Date: Jun 6, 2011 1:02 AM Subject: Travel Assistance applications now open for ApacheCon NA 2011 To: committ...@apache.org The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is now accepting applications for ApacheCon North America 2011, 7-11 November in Vancouver BC, Canada. The TAC is seeking individuals from the Apache community at-large --users, developers, educators, students, Committers, and Members-- who would like to attend ApacheCon, but need some financial support in order to be able to get there. There are limited places available, and all applicants will be scored on their individual merit. Financial assistance is available to cover flights/trains, accommodation and entrance fees either in part or in full, depending on circumstances. However, the support available for those attending only the BarCamp (7-8 November) is less than that for those attending the entire event (Conference + BarCamp 7-11 November). The Travel Assistance Committee aims to support all official ASF events, including cross-project activities; as such, it may be prudent for those in Asia and Europe to wait for an event geographically closer to them. More information can be found at http://www.apache.org/travel/index.html including a link to the online application and detailed instructions for submitting. Applications will close on 8 July 2011 at 22:00 BST (UTC/GMT +1). We wish good luck to all those who will apply, and thank you in advance for tweeting, blogging, and otherwise spreading the word. Regards, The Travel Assistance Committee
[Lucene.Net] [FWD] Travel Assistance applications now open for ApacheCon NA 2011
The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is now accepting applications for ApacheCon North America 2011, 7-11 November in Vancouver BC, Canada. The TAC is seeking individuals from the Apache community at-large --users, developers, educators, students, Committers, and Members-- who would like to attend ApacheCon, but need some financial support in order to be able to get there. There are limited places available, and all applicants will be scored on their individual merit. Financial assistance is available to cover flights/trains, accommodation and entrance fees either in part or in full, depending on circumstances. However, the support available for those attending only the BarCamp (7-8 November) is less than that for those attending the entire event (Conference + BarCamp 7-11 November). The Travel Assistance Committee aims to support all official ASF events, including cross-project activities; as such, it may be prudent for those in Asia and Europe to wait for an event geographically closer to them. More information can be found at http://www.apache.org/travel/index.html including a link to the online application and detailed instructions for submitting. Applications will close on 8 July 2011 at 22:00 BST (UTC/GMT +1). We wish good luck to all those who will apply, and thank you in advance for tweeting, blogging, and otherwise spreading the word. Regards, The Travel Assistance Committee
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044727#comment-13044727 ] Bill Bell commented on SOLR-2242: - Since we changed the output of the facet_fields, the FacetComponent.java needs to change. This also impacts the DistribFieldFacet type. This code is not going to work, since price does not just have a list of numbers. It now has multiple lists (if we set the param). We might want to always return counts list in all cases. Then sharding can easily pick up on this... The DistribFieldFacet needs to be refactored. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int lst name=countsint name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst /lst {code} Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044730#comment-13044730 ] Bill Bell commented on SOLR-2242: - It would be easier for Sharding to not have multiple lists... I could use some help if we want to change it - since I have not played with FacetComponent.java. Otherwise, it would a more simpler fix to just add it and flatten the lists. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Not ideal, but easier for v1 ? I could also just remove numFacetTerms=2 for now. Will only require an if statement to ignore the type check for numFacetTerms. Here is a patch that works with sharding. http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price Enjoy. Bill Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene mailing list archives zip?
Dear list -- is there any archive proper of the lucene dev and user Mailman lists? A link per-month or zip or tar.gz of the mbox files would be terrific. Thanks in advance gregor - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bell updated SOLR-2242: Attachment: SOLR-2242.shard.patch Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2575) post.jar does not work on trunk
[ https://issues.apache.org/jira/browse/SOLR-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044739#comment-13044739 ] Uwe Schindler commented on SOLR-2575: - The problem with the example is Jetty's caching of webapps: It caches the unpacked WAR file. To clean up the web application, you have to remove the unpacked web application in the work folder of example. Maybe clean should automatically do this. I got crazy because of this when modifying JSP files, too. post.jar does not work on trunk --- Key: SOLR-2575 URL: https://issues.apache.org/jira/browse/SOLR-2575 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Bill Bell java -jar post.jar *.xml SimplePostTool: version 1.3 SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file gb18030-example.xml SimplePostTool: POSTing file hd.xml SimplePostTool: POSTing file ipod_other.xml SimplePostTool: POSTing file ipod_video.xml SimplePostTool: POSTing file manufacturers.xml SimplePostTool: POSTing file mem.xml SimplePostTool: POSTing file monitor.xml SimplePostTool: POSTing file monitor2.xml SimplePostTool: POSTing file mp500.xml SimplePostTool: POSTing file sd500.xml SimplePostTool: POSTing file solr.xml SimplePostTool: POSTing file utf8-example.xml SimplePostTool: POSTing file vidcard.xml SimplePostTool: COMMITting Solr index changes.. SimplePostTool: FATAL: Solr returned an error #500 java.lang.NoSuchMethodError: org.apache.lucene.util.CodecUtil.checkHeader(Lorg/apache/lucene/store/IndexInput ;Ljava/lang/String;II)I java.lang.RuntimeException: java.lang.NoSuchMethodError : org.apache.lucene.util.CodecUtil.checkHeader(Lorg/apache/lucene/store/IndexInp ut;Ljava/lang/String;II)I at org.apache.solr.core.SolrCore.getSearcher(SolrCor e.java:1039) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdat eHandler2.java:346) at org.apache.solr.update.processor.RunUpdateProcessor.pro cessCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLo ader.processUpdate(XMLLoader.java:157) at org.apache.solr.handler.XMLLoader.lo ad(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.hand leRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.Req uestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr. core.SolrCore.execute(SolrCore.java:1308) at org.apache.solr.servlet.SolrDispa tchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.Solr DispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.serv let.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortb ay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay .jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortba y.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay. jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.je tty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.h andler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at o rg.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpCon nection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnec tion$RequestHandler -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8635 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8635/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexFileDeleter.testDeleteLeftoverFiles Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:142) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:481) at org.apache.lucene.index.TestIndexFileDeleter.testDeleteLeftoverFiles(TestIndexFileDeleter.java:165) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1227) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1145) Build Log (for compile errors): [...truncated 5069 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: lucene mailing list archives zip?
Hi, may be there are other links but you can try the following: The following is a link to application that can browse the archive http://mail-archives.apache.org/mod_mbox/lucene-java-user/201106.mbox/browser And the following is a link to the raw cumulative mail archive file for that month http://mail-archives.apache.org/mod_mbox/lucene-java-user/201106 So you can wget these files (but I think you should be friendly to the server) Regards, Lukas On Mon, Jun 6, 2011 at 8:54 AM, Gregor Heinrich gre...@arbylon.net wrote: Dear list -- is there any archive proper of the lucene dev and user Mailman lists? A link per-month or zip or tar.gz of the mbox files would be terrific. Thanks in advance gregor - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Travel Assistance applications now open for ApacheCon NA 2011
The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is now accepting applications for ApacheCon North America 2011, 7-11 November in Vancouver BC, Canada. The TAC is seeking individuals from the Apache community at-large --users, developers, educators, students, Committers, and Members-- who would like to attend ApacheCon, but need some financial support in order to be able to get there. There are limited places available, and all applicants will be scored on their individual merit. Financial assistance is available to cover flights/trains, accommodation and entrance fees either in part or in full, depending on circumstances. However, the support available for those attending only the BarCamp (7-8 November) is less than that for those attending the entire event (Conference + BarCamp 7-11 November). The Travel Assistance Committee aims to support all official ASF events, including cross-project activities; as such, it may be prudent for those in Asia and Europe to wait for an event geographically closer to them. More information can be found at http://www.apache.org/travel/index.html including a link to the online application and detailed instructions for submitting. Applications will close on 8 July 2011 at 22:00 BST (UTC/GMT +1). We wish good luck to all those who will apply, and thank you in advance for tweeting, blogging, and otherwise spreading the word. Regards, The Travel Assistance Committee - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 8635 - Failure
I'll dig... somehow, strangely, it seems to be caused by the test speedups... Mike McCandless http://blog.mikemccandless.com On Mon, Jun 6, 2011 at 4:11 AM, Apache Jenkins Server hud...@hudson.apache.org wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8635/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexFileDeleter.testDeleteLeftoverFiles Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:142) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:481) at org.apache.lucene.index.TestIndexFileDeleter.testDeleteLeftoverFiles(TestIndexFileDeleter.java:165) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1227) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1145) Build Log (for compile errors): [...truncated 5069 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 8635 - Failure
I committed fix. It was a bug in the test, only uncovered because the test speedups decreased chance that newField would turn on term vectors if you didn't ask for it... Mike McCandless http://blog.mikemccandless.com On Mon, Jun 6, 2011 at 6:21 AM, Michael McCandless luc...@mikemccandless.com wrote: I'll dig... somehow, strangely, it seems to be caused by the test speedups... Mike McCandless http://blog.mikemccandless.com On Mon, Jun 6, 2011 at 4:11 AM, Apache Jenkins Server hud...@hudson.apache.org wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8635/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexFileDeleter.testDeleteLeftoverFiles Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:142) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:481) at org.apache.lucene.index.TestIndexFileDeleter.testDeleteLeftoverFiles(TestIndexFileDeleter.java:165) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1227) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1145) Build Log (for compile errors): [...truncated 5069 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8638 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8638/ 4 tests failed. REGRESSION: org.apache.lucene.index.TestLazyBug.testLazyWorks Error Message: Read past EOF Stack Trace: java.io.IOException: Read past EOF at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) at org.apache.lucene.store.DataInput.readInt(DataInput.java:73) at org.apache.lucene.store.DataInput.readLong(DataInput.java:115) at org.apache.lucene.store.MockIndexInputWrapper.readLong(MockIndexInputWrapper.java:128) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:211) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:463) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:565) at org.apache.lucene.index.TestLazyBug.doTest(TestLazyBug.java:105) at org.apache.lucene.index.TestLazyBug.testLazyWorks(TestLazyBug.java:129) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) REGRESSION: org.apache.lucene.index.TestLazyBug.testLazyAlsoWorks Error Message: Read past EOF Stack Trace: java.io.IOException: Read past EOF at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) at org.apache.lucene.store.DataInput.readInt(DataInput.java:73) at org.apache.lucene.store.DataInput.readLong(DataInput.java:115) at org.apache.lucene.store.MockIndexInputWrapper.readLong(MockIndexInputWrapper.java:128) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:211) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:463) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:565) at org.apache.lucene.index.TestLazyBug.doTest(TestLazyBug.java:105) at org.apache.lucene.index.TestLazyBug.testLazyAlsoWorks(TestLazyBug.java:133) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) REGRESSION: org.apache.lucene.index.TestLazyBug.testLazyBroken Error Message: Read past EOF Stack Trace: java.io.IOException: Read past EOF at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) at org.apache.lucene.store.DataInput.readInt(DataInput.java:73) at org.apache.lucene.store.DataInput.readLong(DataInput.java:115) at org.apache.lucene.store.MockIndexInputWrapper.readLong(MockIndexInputWrapper.java:128) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:211) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:463) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:565) at org.apache.lucene.index.TestLazyBug.doTest(TestLazyBug.java:105) at org.apache.lucene.index.TestLazyBug.testLazyBroken(TestLazyBug.java:137) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) FAILED: junit.framework.TestSuite.org.apache.lucene.index.TestLazyBug Error Message: MockDirectoryWrapper: cannot close: there are still open files: {_0_2.doc=3, _0_1.skp=3, _1_0.frq=3, _0_2.frq=3, _1_2.frq=3, _1_3.frq=3, _1_3.tib=3, _0_1.doc=3, _0_2.pyl=3, _1_1.tib=3, _0_0.tib=3, _0.tvd=3, _1_2.doc=3, _0.tvf=3, _1_0.prx=3, _0_1.frq=3, _1_2.pyl=3, _1.fdx=3, _0_3.prx=3, _0_2.skp=3, _1.fdt=3, _0.tvx=3, _0_2.pos=3, _1_3.prx=3, _1.nrm=3, _1_1.pyl=3, _1_0.tib=3, _0_1.tib=3, _1_2.tib=3, _1_1.doc=3, _0_0.prx=3, _1_1.frq=3, _0_3.frq=3, _1.tvx=3, _0.nrm=3, _0_0.frq=3, _1_1.pos=3, _0_3.tib=3, _0_2.tib=3, _1_1.skp=3, _1.tvf=3, _1_2.skp=3, _0_1.pyl=3, _0.fdx=3, _1.tvd=3, _1_2.pos=3, _0_1.pos=3, _0.fdt=3} Stack Trace: java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_0_2.doc=3, _0_1.skp=3, _1_0.frq=3, _0_2.frq=3, _1_2.frq=3, _1_3.frq=3, _1_3.tib=3, _0_1.doc=3, _0_2.pyl=3, _1_1.tib=3, _0_0.tib=3, _0.tvd=3, _1_2.doc=3, _0.tvf=3, _1_0.prx=3, _0_1.frq=3, _1_2.pyl=3, _1.fdx=3, _0_3.prx=3, _0_2.skp=3, _1.fdt=3, _0.tvx=3, _0_2.pos=3, _1_3.prx=3, _1.nrm=3, _1_1.pyl=3, _1_0.tib=3, _0_1.tib=3, _1_2.tib=3, _1_1.doc=3, _0_0.prx=3, _1_1.frq=3, _0_3.frq=3, _1.tvx=3, _0.nrm=3, _0_0.frq=3, _1_1.pos=3, _0_3.tib=3, _0_2.tib=3, _1_1.skp=3, _1.tvf=3, _1_2.skp=3, _0_1.pyl=3, _0.fdx=3,
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044820#comment-13044820 ] Michael McCandless commented on SOLR-2564: -- Patch looks great Martijn! Only thing I noticed is cacheSizeMB is computed incorrectly from maxDoc (for the -1 case), because that's all int math I think? Ie it'll be truncated from eg 13.7 MB - 13. But: why not just use Double.MAX_VALUE? Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044821#comment-13044821 ] Michael McCandless commented on SOLR-2564: -- bq. The other use-case is more like field collapsing and does change what documents match (basically, only the first documents in each group, up to limit, match). I'm not sure it's that simple, ie that we can so cleanly model collapsing as reducing the docs to consider and then running faceting on that reduced set. EG, the use case of getting correct facet counts for a field that has different values within the group, can't be handled by this approach? This is the count=2 for size=S in my example at https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038605page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038605 I think to do that properly, the faceting impl needs to see all docs in the group, not just the lead doc per group. I think another way to visualize/model this that we really need to be able to configure which field counts (ID_FIELD) for the schema. This field would then decide all counts -- total hit count, facet counts, etc., ie each of these counts is count(unique(ID_FIELD)) of the docs falling in that facet/result set. The default is Lucene's docid, but the app should be able to state any other ID_FIELD. Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8638 - Failure
bug in the test... i committed a fix On Mon, Jun 6, 2011 at 8:14 AM, Apache Jenkins Server hud...@hudson.apache.org wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8638/ 4 tests failed. REGRESSION: org.apache.lucene.index.TestLazyBug.testLazyWorks Error Message: Read past EOF Stack Trace: java.io.IOException: Read past EOF at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) at org.apache.lucene.store.DataInput.readInt(DataInput.java:73) at org.apache.lucene.store.DataInput.readLong(DataInput.java:115) at org.apache.lucene.store.MockIndexInputWrapper.readLong(MockIndexInputWrapper.java:128) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:211) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:463) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:565) at org.apache.lucene.index.TestLazyBug.doTest(TestLazyBug.java:105) at org.apache.lucene.index.TestLazyBug.testLazyWorks(TestLazyBug.java:129) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) REGRESSION: org.apache.lucene.index.TestLazyBug.testLazyAlsoWorks Error Message: Read past EOF Stack Trace: java.io.IOException: Read past EOF at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) at org.apache.lucene.store.DataInput.readInt(DataInput.java:73) at org.apache.lucene.store.DataInput.readLong(DataInput.java:115) at org.apache.lucene.store.MockIndexInputWrapper.readLong(MockIndexInputWrapper.java:128) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:211) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:463) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:565) at org.apache.lucene.index.TestLazyBug.doTest(TestLazyBug.java:105) at org.apache.lucene.index.TestLazyBug.testLazyAlsoWorks(TestLazyBug.java:133) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) REGRESSION: org.apache.lucene.index.TestLazyBug.testLazyBroken Error Message: Read past EOF Stack Trace: java.io.IOException: Read past EOF at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) at org.apache.lucene.store.DataInput.readInt(DataInput.java:73) at org.apache.lucene.store.DataInput.readLong(DataInput.java:115) at org.apache.lucene.store.MockIndexInputWrapper.readLong(MockIndexInputWrapper.java:128) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:211) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:463) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:565) at org.apache.lucene.index.TestLazyBug.doTest(TestLazyBug.java:105) at org.apache.lucene.index.TestLazyBug.testLazyBroken(TestLazyBug.java:137) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) FAILED: junit.framework.TestSuite.org.apache.lucene.index.TestLazyBug Error Message: MockDirectoryWrapper: cannot close: there are still open files: {_0_2.doc=3, _0_1.skp=3, _1_0.frq=3, _0_2.frq=3, _1_2.frq=3, _1_3.frq=3, _1_3.tib=3, _0_1.doc=3, _0_2.pyl=3, _1_1.tib=3, _0_0.tib=3, _0.tvd=3, _1_2.doc=3, _0.tvf=3, _1_0.prx=3, _0_1.frq=3, _1_2.pyl=3, _1.fdx=3, _0_3.prx=3, _0_2.skp=3, _1.fdt=3, _0.tvx=3, _0_2.pos=3, _1_3.prx=3, _1.nrm=3, _1_1.pyl=3, _1_0.tib=3, _0_1.tib=3, _1_2.tib=3, _1_1.doc=3, _0_0.prx=3, _1_1.frq=3, _0_3.frq=3, _1.tvx=3, _0.nrm=3, _0_0.frq=3, _1_1.pos=3, _0_3.tib=3, _0_2.tib=3, _1_1.skp=3, _1.tvf=3, _1_2.skp=3, _0_1.pyl=3, _0.fdx=3, _1.tvd=3, _1_2.pos=3, _0_1.pos=3, _0.fdt=3} Stack Trace: java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_0_2.doc=3, _0_1.skp=3, _1_0.frq=3, _0_2.frq=3, _1_2.frq=3, _1_3.frq=3, _1_3.tib=3, _0_1.doc=3, _0_2.pyl=3, _1_1.tib=3, _0_0.tib=3, _0.tvd=3, _1_2.doc=3, _0.tvf=3, _1_0.prx=3, _0_1.frq=3, _1_2.pyl=3, _1.fdx=3, _0_3.prx=3, _0_2.skp=3, _1.fdt=3, _0.tvx=3, _0_2.pos=3, _1_3.prx=3, _1.nrm=3, _1_1.pyl=3, _1_0.tib=3, _0_1.tib=3,
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044828#comment-13044828 ] Mark Harwood commented on LUCENE-2454: -- Below are 2 example tests searching employment resumes - both using the same optional and mandatory clauses but in subtly different ways. Question 1 is who has Mahout skills and preferably used them at Lucid? while the other question is who has Mahout skills and preferably has been employed by Lucid?. The questions and the answers are different. Below is the XML test script I used to illustrate the data/queries used, define expected results and run as an executable test. Hopefully you can make sense of this: {code:xml} ?xml version=1.0 encoding=UTF-8? ?xml-stylesheet type=text/xsl href=test.xsl? Test description=NestedQuery tests Data Index name=ResumeIndex Analyzers class=org.apache.lucene.analysis.WhitespaceAnalyzer /Analyzers Shard name=shard1 !-- === -- Document pk=1 Field name=namegrant/Field Field name=docTyperesume/Field /Document !-- === -- Document pk=2 Field name=employerlucid/Field Field name=docTypeemployment/Field Field name=skillsjava lucene/Field /Document !-- === -- Document pk=3 Field name=employersomewhere else/Field Field name=docTypeemployment/Field Field name=skillsmahout and more mahout/Field /Document !-- === -- Document pk=4 Field name=namesean/Field Field name=docTyperesume/Field /Document !-- === -- Document pk=5 Field name=employerfoo bar/Field Field name=docTypeemployment/Field Field name=skillsjava/Field /Document !-- === -- Document pk=6 Field name=employersome co/Field Field name=docTypeemployment/Field Field name=skillsmahout mahout and more mahout/Field /Document /Shard /Index /Data Tests Test description=Who knows Mahout and preferably used it *while employed at Lucid*? Query NestedQuery !-- testing properties of individual child employment docs -- Query BooleanQuery Clause occurs=must TermsQuery fieldName=skillsmahout/TermsQuery /Clause Clause occurs=should TermsQuery fieldName=employerlucid/TermsQuery /Clause /BooleanQuery /Query ParentsFilter TermsFilter fieldName=docTyperesume/TermsFilter /ParentsFilter /NestedQuery /Query ExpectedResults why=Grant's tenure at Lucid is
[jira] [Created] (LUCENE-3176) TestNRTThreads test failure
TestNRTThreads test failure --- Key: LUCENE-3176 URL: https://issues.apache.org/jira/browse/LUCENE-3176 Project: Lucene - Java Issue Type: Bug Environment: trunk Reporter: Robert Muir hit a fail in TestNRTThreads running tests over and over: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3176) TestNRTThreads test failure
[ https://issues.apache.org/jira/browse/LUCENE-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044840#comment-13044840 ] Robert Muir commented on LUCENE-3176: - {noformat} [junit] Testsuite: org.apache.lucene.index.TestNRTThreads [junit] Testcase: testNRTThreads(org.apache.lucene.index.TestNRTThreads): FAILED [junit] expected:8 but was:18 [junit] junit.framework.AssertionFailedError: expected:8 but was:18 [junit] at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:515) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 19.812 sec [junit] [junit] - Standard Output --- [junit] doc id=157 is supposed to be deleted, but got docID=119 [junit] doc id=82 is supposed to be deleted, but got docID=68 [junit] doc id=83 is supposed to be deleted, but got docID=38 [junit] doc id=80 is supposed to be deleted, but got docID=36 [junit] doc id=81 is supposed to be deleted, but got docID=37 [junit] doc id=67 is supposed to be deleted, but got docID=24 [junit] doc id=69 is supposed to be deleted, but got docID=26 [junit] doc id=68 is supposed to be deleted, but got docID=25 [junit] doc id=672 is supposed to be deleted, but got docID=430 [junit] doc id=444 is supposed to be deleted, but got docID=344 [junit] doc id=441 is supposed to be deleted, but got docID=766 [junit] doc id=442 is supposed to be deleted, but got docID=343 [junit] doc id=443 is supposed to be deleted, but got docID=767 [junit] doc id=70 is supposed to be deleted, but got docID=67 [junit] doc id=71 is supposed to be deleted, but got docID=27 [junit] doc id=72 is supposed to be deleted, but got docID=28 [junit] doc id=73 is supposed to be deleted, but got docID=29 [junit] doc id=74 is supposed to be deleted, but got docID=30 [junit] doc id=75 is supposed to be deleted, but got docID=31 [junit] doc id=76 is supposed to be deleted, but got docID=32 [junit] doc id=219 is supposed to be deleted, but got docID=175 [junit] doc id=662 is supposed to be deleted, but got docID=425 [junit] doc id=663 is supposed to be deleted, but got docID=426 [junit] doc id=218 is supposed to be deleted, but got docID=174 [junit] doc id=361 is supposed to be deleted, but got docID=286 [junit] doc id=362 is supposed to be deleted, but got docID=287 [junit] doc id=360 is supposed to be deleted, but got docID=285 [junit] doc id=366 is supposed to be deleted, but got docID=291 [junit] doc id=365 is supposed to be deleted, but got docID=290 [junit] doc id=364 is supposed to be deleted, but got docID=289 [junit] doc id=363 is supposed to be deleted, but got docID=288 [junit] doc id=368 is supposed to be deleted, but got docID=293 [junit] doc id=367 is supposed to be deleted, but got docID=292 [junit] doc id=518 is supposed to be deleted, but got docID=361 [junit] doc id=517 is supposed to be deleted, but got docID=805 [junit] doc id=220 is supposed to be deleted, but got docID=176 [junit] doc id=324 is supposed to be deleted, but got docID=269 [junit] doc id=322 is supposed to be deleted, but got docID=268 [junit] - --- [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestNRTThreads -Dtestmethod=testNRTThreads -Dtests.seed=0:0 [junit] NOTE: test params are: codec=RandomCodecProvider: {extra8=MockFixedIntBlock(blockSize=1054), extra9=MockVariableIntBlock(baseBlockSize=87), body=MockSep, extra0=MockVariableIntBlock(baseBlockSize=87), packID=Pulsing(freqCutoff=16), extra1=MockRandom, extra2=Standard, extra3=SimpleText, date=MockVariableIntBlock(baseBlockSize=87), extra4=MockSep, extra5=Pulsing(freqCutoff=16), extra6=MockFixedIntBlock(blockSize=1054), extra7=MockVariableIntBlock(baseBlockSize=87), docid=MockVariableIntBlock(baseBlockSize=87), title=SimpleText, titleTokenized=Standard}, locale=ar_JO, timezone=Europe/Oslo [junit] NOTE: all tests run in this JVM: [junit] [TestSearchForDuplicates, TestMockAnalyzer, TestCheckIndex, TestDoc, TestFlex, TestIndexReaderCloneNorms, TestIndexWriterExceptions, TestIndexWriterUnicode, TestMultiLevelSkipList, TestNRTThreads] [junit] NOTE: Mac OS X 10.6.7 x86_64/Apple Inc. 1.6.0_24 (64-bit)/cpus=4,threads=1,free=41147720,total=85000192 [junit] - --- [junit] TEST org.apache.lucene.index.TestNRTThreads FAILED {noformat} TestNRTThreads test failure ---
[jira] [Commented] (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl
[ https://issues.apache.org/jira/browse/LUCENE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044860#comment-13044860 ] David Smiley commented on LUCENE-2645: -- Thanks for the test Korusaka. I didn't realize my bug report last year that an assert condition's should become = was insufficient for a committer to simply make the 1-char change. I guess I should work on creating tests for nearly everything for my bug reports to get more traction. :-| False assertion of 0 position delta in StandardPostingsWriterImpl -- Key: LUCENE-2645 URL: https://issues.apache.org/jira/browse/LUCENE-2645 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: David Smiley Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LuceneTrunkAssertErrorReproducer.java StandardPostingsWriterImpl line 159 is: {code:java} assert delta 0 || position == 0 || position == -1: position= + position + lastPosition= + lastPosition;// not quite right (if pos=0 is repeated twice we don't catch it) {code} I enable assertions when I run my unit tests and I've found this assertion to fail when delta is 0 which occurs when the same position value is sent in twice in arrow. Once I added RemoveDuplicatesTokenFilter, this problem went away. Should I really be forced to add this filter? I think delta = 0 would be a better assertion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044867#comment-13044867 ] Yonik Seeley commented on SOLR-2564: bq. The other use-case is more like field collapsing and does change what documents match (basically, only the first documents in each group, up to limit, match). bq. I'm not sure it's that simple, ie that we can so cleanly model collapsing as reducing the docs to consider and then running faceting on that reduced set. I *think* that's what was actually implemented in SOLR-236 IIRC, and what some people seem to be asking for. bq. EG, the use case of getting correct facet counts for a field that has different values within the group, can't be handled by this approach? Well, correct is a matter of context ;-) (for example, some have called the facet counts for the current grouping implementation incorrect because it didn't happen to match their use case). Looking at the original description in LUCENE-3097, it seems you're talking about Martijn's 3rd method, while I was talking about the 2nd. But maybe some people that were originally advocating for #2, really wanted #3? Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044895#comment-13044895 ] Joan Codina commented on SOLR-2399: --- I did some changes to the current version of the Schema-Browser, some time ago.. you can find it in this issue #SOLR-2440. It has some features that I found interesting: A: Drill down, so you can select a word in the list of most common words and perform a query. B. Select the list of fields to be the output of the query. apart from sorting and showing the field names in alphabetical order and not capitalised. Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-admin-interface.patch, SOLR-2399-fluid-width.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) (!) As Erick pointed out .. Chrome's XML-Capabilities are a bit odd, so it does not render Raw-XML-Data (like we're using for displaying the Schema and Config-File) -- instead it looks like this: http://files.mathe.is/solr-admin/00_chrome-xml.png ; so it would be really nice, to see the [xinclude-Interface|http://files.mathe.is/solr-admin/xinclude/] there :) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2462: - Attachment: SOLR-2462.patch I guess I should have run that one myself too. This test is very similar to the ones in SpellCheckCollatorTest. I guess while the ones in SCCT test whether or not it can collate properly, TSCR checks that the response it sends back is proper. In any case, this is just another one of my brittle tests! Because we're using a different comparator, results with tied scores don't come back exactly the same as before. So now this test needs more than 5 tries to find the 2nd valid collation. I up'ed it from 5 to 10 and now it passes. Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Assignee: Robert Muir Priority: Critical Fix For: 3.1.1, 4.0 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2571: - Attachment: SOLR-2571.patch This version takes all of DirectSolrSpellChecker's parameters as Integer and Float objects rather than Strings, as appropriate. Also, I changed the accuracy parameter to use SpellingParams.SPELLCHECK_ACCURACY ... I'm not sure if this would have validated any unit tests (I didn't see any tests that use DirectSolrSpellChecker). I think this will make DirectSolrSpellChecker more consistent with the rest of solrconfig.xmls parameter requirements. The only better option than this, maybe, would to make it flexible and allow either the Int/Float or String in these cases. I think this later option is not necessary however. IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl
[ https://issues.apache.org/jira/browse/LUCENE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044918#comment-13044918 ] Michael McCandless commented on LUCENE-2645: While test cases are always welcome, they certainly are not necessary in a patch (Yonik's Law of Patches). Which issue had you opened before? Somehow it fell through the cracks... which, unfortunately, happens all the time in open-source. Best to bump/gently nag on important fixes... False assertion of 0 position delta in StandardPostingsWriterImpl -- Key: LUCENE-2645 URL: https://issues.apache.org/jira/browse/LUCENE-2645 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: David Smiley Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LuceneTrunkAssertErrorReproducer.java StandardPostingsWriterImpl line 159 is: {code:java} assert delta 0 || position == 0 || position == -1: position= + position + lastPosition= + lastPosition;// not quite right (if pos=0 is repeated twice we don't catch it) {code} I enable assertions when I run my unit tests and I've found this assertion to fail when delta is 0 which occurs when the same position value is sent in twice in arrow. Once I added RemoveDuplicatesTokenFilter, this problem went away. Should I really be forced to add this filter? I think delta = 0 would be a better assertion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044917#comment-13044917 ] James Dyer commented on SOLR-2571: -- {quote} what makes this 'decision' of correctlySpelled? Do you know? {quote} I took a quick look to find out. Its more complicated than I thought! Here's the basic jist (I think!) : - If the instance of SolrSpellChecker returns frequency data and all suggestions have frequency 0, TRUE. - If the instance of SolrSpellChecker returns frequency data and any suggestion have frequency == 0, FALSE. - If the instance of SolrSpellChecker returns NO frequency data but has suggestions, OMIT. - If the instance of SolrSpellChecker returns NO suggestions, FALSE. Possibly this isn't fully accurate but I'm at least mostly correct here. Seems like the discrepency with DirectSolrSpellChecker is because it isn't returning Frequency info? This all happens in SpellCheckComponent.toNamedList() ... I'm guessing the code here uses the presence or absence of frequency data as kind of a proxy indicator whether or not its dealing with IndexBasedSpellChecker or FileBasedSpellChecker. Possibly it would be better if each instance of SolrSpellChecker had a isCorrectlySpelled() method that toNamedList() could call? Maybe I should I go open another jira issue for that? IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3176) TestNRTThreads test failure
[ https://issues.apache.org/jira/browse/LUCENE-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-3176: -- Assignee: Michael McCandless TestNRTThreads test failure --- Key: LUCENE-3176 URL: https://issues.apache.org/jira/browse/LUCENE-3176 Project: Lucene - Java Issue Type: Bug Environment: trunk Reporter: Robert Muir Assignee: Michael McCandless hit a fail in TestNRTThreads running tests over and over: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl
[ https://issues.apache.org/jira/browse/LUCENE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044926#comment-13044926 ] David Smiley commented on LUCENE-2645: -- bq. Which issue had you opened before? This one! ;-) -- But if you want to give Korusaka credit for it because he submitted a patch then fine. He went the extra mile that I didn't think was necessary. False assertion of 0 position delta in StandardPostingsWriterImpl -- Key: LUCENE-2645 URL: https://issues.apache.org/jira/browse/LUCENE-2645 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: David Smiley Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LuceneTrunkAssertErrorReproducer.java StandardPostingsWriterImpl line 159 is: {code:java} assert delta 0 || position == 0 || position == -1: position= + position + lastPosition= + lastPosition;// not quite right (if pos=0 is repeated twice we don't catch it) {code} I enable assertions when I run my unit tests and I've found this assertion to fail when delta is 0 which occurs when the same position value is sent in twice in arrow. Once I added RemoveDuplicatesTokenFilter, this problem went away. Should I really be forced to add this filter? I think delta = 0 would be a better assertion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044927#comment-13044927 ] Michael McCandless commented on SOLR-2564: -- I think a good criteria for correct is if you were to click through on the facet (ie, take the current query and add a filter on facet field = facet value), would the hit count you see match the facet count you were just looking at? Ie, drill down should be consistent. Both approaches will give the same facets counts if the field never varies within the group (ie, the field belongs to the parent docs); it's only child fields where you need faceting to be aware of the groups, so for apps that never display facets on child fields, only computing facets on the group heads will work. I suspect doc blocks will be the only practical way to implement faceting on child fields efficiently. Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl
[ https://issues.apache.org/jira/browse/LUCENE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044928#comment-13044928 ] Michael McCandless commented on LUCENE-2645: D'oh! Woops :) I didn't see that you had opened this issue! And I missed it from last September... sorry :( I will add you to CHANGES. And no that extra mile is not necessary. Just some gentle nagging would help stuff not fall past the event horizons on our todo lists :) False assertion of 0 position delta in StandardPostingsWriterImpl -- Key: LUCENE-2645 URL: https://issues.apache.org/jira/browse/LUCENE-2645 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: David Smiley Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LuceneTrunkAssertErrorReproducer.java StandardPostingsWriterImpl line 159 is: {code:java} assert delta 0 || position == 0 || position == -1: position= + position + lastPosition= + lastPosition;// not quite right (if pos=0 is repeated twice we don't catch it) {code} I enable assertions when I run my unit tests and I've found this assertion to fail when delta is 0 which occurs when the same position value is sent in twice in arrow. Once I added RemoveDuplicatesTokenFilter, this problem went away. Should I really be forced to add this filter? I think delta = 0 would be a better assertion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl
[ https://issues.apache.org/jira/browse/LUCENE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044929#comment-13044929 ] KuroSaka TeruHiko commented on LUCENE-2645: --- Thank you, Michael, for quick fix, and David, for initially reporting this bug and giving me a credit :-) False assertion of 0 position delta in StandardPostingsWriterImpl -- Key: LUCENE-2645 URL: https://issues.apache.org/jira/browse/LUCENE-2645 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: David Smiley Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LuceneTrunkAssertErrorReproducer.java StandardPostingsWriterImpl line 159 is: {code:java} assert delta 0 || position == 0 || position == -1: position= + position + lastPosition= + lastPosition;// not quite right (if pos=0 is repeated twice we don't catch it) {code} I enable assertions when I run my unit tests and I've found this assertion to fail when delta is 0 which occurs when the same position value is sent in twice in arrow. Once I added RemoveDuplicatesTokenFilter, this problem went away. Should I really be forced to add this filter? I think delta = 0 would be a better assertion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl
[ https://issues.apache.org/jira/browse/LUCENE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044930#comment-13044930 ] Michael McCandless commented on LUCENE-2645: Thank you both :) False assertion of 0 position delta in StandardPostingsWriterImpl -- Key: LUCENE-2645 URL: https://issues.apache.org/jira/browse/LUCENE-2645 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: David Smiley Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LuceneTrunkAssertErrorReproducer.java StandardPostingsWriterImpl line 159 is: {code:java} assert delta 0 || position == 0 || position == -1: position= + position + lastPosition= + lastPosition;// not quite right (if pos=0 is repeated twice we don't catch it) {code} I enable assertions when I run my unit tests and I've found this assertion to fail when delta is 0 which occurs when the same position value is sent in twice in arrow. Once I added RemoveDuplicatesTokenFilter, this problem went away. Should I really be forced to add this filter? I think delta = 0 would be a better assertion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044940#comment-13044940 ] Martijn van Groningen commented on SOLR-2564: - bq. But: why not just use Double.MAX_VALUE? Yes, I should have used that and I'll change that. I thought that the size was initially used to create the underline array. But it isn't! The array inside the caching collector initially starts with a length 128 and grows when needed. How I've currently implemented LUCENE-3097 is that it will only get the most relevant document of each group. In terms of SOLR-236 that is the same as using collapse.threshold=1. I think what Yonik means is increasing the threshold so more documents and up in the docset, that eventually is used by the facet component. Increasing this threshold also means setting when to start to collapse. So when setting the collapse.threshold=3 this means that from the 4th document the collapsing starts. I think that the whole collaps.threshold feature doesn't scale very well. Anyway, I think when we go wire the 2nd method (LUCENE-3097) into Solr, we should first make it work for the most relevant group documents. Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format
[ https://issues.apache.org/jira/browse/SOLR-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044971#comment-13044971 ] Steven Rowe commented on SOLR-1844: --- Hi David, The link in the description is dead - this one mentioned the new400common.txt file: http://www.hathitrust.org/node/181 but I'm not sure it's what you were after. Looks like this is the sample you're talking about: http://www.hathitrust.org/blogs/large-scale-search/common-word-list-commongrams - I can see the comma deliminted values there. Would you care to make a patch? CommonGramsQueryFilterFactory should read words in a comma-delimited format --- Key: SOLR-1844 URL: https://issues.apache.org/jira/browse/SOLR-1844 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4 Reporter: David Smiley Priority: Minor CommonGramsQueryFilterFactory expects that the file(s) given to the words argument is a carriage-return delimited list of words. It doesn't support comments either. This file format should be more flexible to support comma delimited values. I came across this because I was trying to use the sample file provided by HathiTrust: http://www.hathitrust.org/node/180(named in a file new400common.txt) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3176) TestNRTThreads test failure
[ https://issues.apache.org/jira/browse/LUCENE-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044978#comment-13044978 ] Robert Muir commented on LUCENE-3176: - on my machine: this one is tough to reproduce. if I run the test by itself, it seems to pass. however, if my machine is busy (e.g. running ant test-core -Dtests.seed=0:0), then it fails! TestNRTThreads test failure --- Key: LUCENE-3176 URL: https://issues.apache.org/jira/browse/LUCENE-3176 Project: Lucene - Java Issue Type: Bug Environment: trunk Reporter: Robert Muir Assignee: Michael McCandless hit a fail in TestNRTThreads running tests over and over: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3176) TestNRTThreads test failure
[ https://issues.apache.org/jira/browse/LUCENE-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044981#comment-13044981 ] Jason Rutherglen commented on LUCENE-3176: -- It's probably the new DWPT code. There was a specific issue to fix this problem LUCENE-2956. TestNRTThreads test failure --- Key: LUCENE-3176 URL: https://issues.apache.org/jira/browse/LUCENE-3176 Project: Lucene - Java Issue Type: Bug Environment: trunk Reporter: Robert Muir Assignee: Michael McCandless hit a fail in TestNRTThreads running tests over and over: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format
[ https://issues.apache.org/jira/browse/SOLR-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044988#comment-13044988 ] David Smiley commented on SOLR-1844: On second thought, I think the current behavior is fine because it's consistent with the other filters that need lists of words since they all share the same code to do it -- BaseTokenStreamFactory.getWordSet(...). If any change should happen, it should happen there. I'm fine with this issue being closed as Won't-Fix. It was easy enough for me to simply replace the commas in Hathi's file with a carriage return. CommonGramsQueryFilterFactory should read words in a comma-delimited format --- Key: SOLR-1844 URL: https://issues.apache.org/jira/browse/SOLR-1844 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4 Reporter: David Smiley Priority: Minor CommonGramsQueryFilterFactory expects that the file(s) given to the words argument is a carriage-return delimited list of words. It doesn't support comments either. This file format should be more flexible to support comma delimited values. I came across this because I was trying to use the sample file provided by HathiTrust: http://www.hathitrust.org/node/180(named in a file new400common.txt) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format
[ https://issues.apache.org/jira/browse/SOLR-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved SOLR-1844. --- Resolution: Won't Fix Assignee: Steven Rowe Thanks David. CommonGramsQueryFilterFactory should read words in a comma-delimited format --- Key: SOLR-1844 URL: https://issues.apache.org/jira/browse/SOLR-1844 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4 Reporter: David Smiley Assignee: Steven Rowe Priority: Minor CommonGramsQueryFilterFactory expects that the file(s) given to the words argument is a carriage-return delimited list of words. It doesn't support comments either. This file format should be more flexible to support comma delimited values. I came across this because I was trying to use the sample file provided by HathiTrust: http://www.hathitrust.org/node/180(named in a file new400common.txt) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045002#comment-13045002 ] Yonik Seeley commented on SOLR-2564: Browsing around this a bit more... the existing solr code selected the string based collectors for any ValueSource of StrFieldSource. This patch resorts to exact getClass() checks against string and text fields which won't match in as many cases (either derived fields, or user custom fields that don't derive from either of the these field types) Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-2399: Attachment: SOLR-2399-sorting-fields.patch SOLR-2399-analysis-stopwords.patch Erick, bq. This one is odd, adding stopwords seems to break analysis... Hm, did not handle removed tokens correctly. Patch attached bq. Oh, for the drop-down for choosing fields or types, would it be possible to order them alphabetically like the schema browser? Yes, Patch attached too. bq. BTW, this whole effort is a long-needed makeover, I'm glad you've taken it on. Can I do something other than complain? Thanks :) For me, it's just fine .. continue using the interface, as you normally would use it. We need just more feedback, other usecases, other schema-/field-definitions, ... -- to see where things are not working as expected! Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) (!) As Erick pointed out .. Chrome's XML-Capabilities are a bit odd, so it does not render Raw-XML-Data (like we're using for displaying the Schema and Config-File) -- instead it looks like this: http://files.mathe.is/solr-admin/00_chrome-xml.png ; so it would be really nice, to see the [xinclude-Interface|http://files.mathe.is/solr-admin/xinclude/] there :) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045017#comment-13045017 ] Stefan Matheis (steffkes) commented on SOLR-2399: - Joan, bq. A: Drill down, so you can select a word in the list of most common words and perform a query. Added to the list, will extend the Query-Form so that's possible to predefine Field-Values bq. B. Select the list of fields to be the output of the query. Regarding your patch, that's directly related, right? Selected Field will be used for the query and is the only listed field for the {{fl=}} param. Never though about that, thanks - will add this also. bq. apart from sorting and showing the field names in alphabetical order and not capitalised. thats the current state, the values are just taken from the /admin/luke-Handler Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) (!) As Erick pointed out .. Chrome's XML-Capabilities are a bit odd, so it does not render Raw-XML-Data (like we're using for displaying the Schema and Config-File) -- instead it looks like this: http://files.mathe.is/solr-admin/00_chrome-xml.png ; so it would be really nice, to see the [xinclude-Interface|http://files.mathe.is/solr-admin/xinclude/] there :) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045019#comment-13045019 ] Ryan McKinley commented on SOLR-2399: - check revision: 1132724 Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) (!) As Erick pointed out .. Chrome's XML-Capabilities are a bit odd, so it does not render Raw-XML-Data (like we're using for displaying the Schema and Config-File) -- instead it looks like this: http://files.mathe.is/solr-admin/00_chrome-xml.png ; so it would be really nice, to see the [xinclude-Interface|http://files.mathe.is/solr-admin/xinclude/] there :) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045027#comment-13045027 ] Yonik Seeley commented on SOLR-2564: I've been checking out the performance, and it generally seems fine. But of course we normally short circuit based on comparators and often don't get beyond that... so to exercise isolate the rest of the code, I tried a worst-case scenario where the short circuit wouldn't work (sort=_docid_ desc) and solr trunk with this patch is ~16% slower than without it. Any ideas what the problem might be? {code} http://localhost:8983/solr/select?q=*:*sort=_docid_ descgroup=truegroup.cacheMB=0group.field=single1000_i {code} Note: the single1000_i field is a single valued int field with 1000 unique values Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045028#comment-13045028 ] Robert Muir commented on SOLR-2462: --- Thanks for the explanation and updated patch James... I'll test this out shortly! Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Assignee: Robert Muir Priority: Critical Fix For: 3.1.1, 4.0 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045029#comment-13045029 ] Robert Muir commented on SOLR-2571: --- {quote} This version takes all of DirectSolrSpellChecker's parameters as Integer and Float objects rather than Strings, as appropriate. {quote} Did you maybe upload an older patch? I took a look and it only seems to cutover the threshold param. {quote} I'm not sure if this would have validated any unit tests (I didn't see any tests that use DirectSolrSpellChecker). {quote} There is a test (DirectSolrSpellCheckerTest), but its probably not that great :) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045031#comment-13045031 ] Robert Muir commented on SOLR-2571: --- {quote} Possibly this isn't fully accurate but I'm at least mostly correct here. Seems like the discrepency with DirectSolrSpellChecker is because it isn't returning Frequency info? {quote} This sounds like a bug, care to open a separate issue on it? (we can resolve the int/float stuff here on this one). The thing certainly intends to return freq info... {noformat} SuggestWord[] suggestions = checker.suggestSimilar(new Term(field, token.toString()), options.count, options.reader, options.onlyMorePopular, accuracy); for (SuggestWord suggestion : suggestions) result.add(token, suggestion.string, suggestion.freq); {noformat} IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2571: - Attachment: SOLR-2571.patch Here is that patch with Ints/Floats instead of Strings. I made a tiny adjustment to the unit test also. IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-2399: Description: *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin was: *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) (!) As Erick pointed out .. Chrome's XML-Capabilities are a bit odd, so it does not render Raw-XML-Data (like we're using for displaying the Schema and Config-File) -- instead it looks like this: http://files.mathe.is/solr-admin/00_chrome-xml.png ; so it would be really nice, to see the [xinclude-Interface|http://files.mathe.is/solr-admin/xinclude/] there :) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail:
[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-2399: Attachment: SOLR-2399-110606.patch bq. check revision: 1132724 Yes, works. Attached Patch fixes a few smaller Things: * bq. Ryan: schema browser has a funny character to the right of Please Select * bq. Ryan: sometimes the schema-browser page does not load when i click on it -- perhaps because it has a '-' in the name? * Also on the Schema/Config Page, i've replaced the iframe -- which just shows the raw xml files -- through the javascript highlighter (already used for dataimport-config), so will now work also in chrome (w/o extensions). still missing the xinclude-feature -- feedback anyone? * New Style for 'Ping' in Navigation, if the {{/admin/ping}} handler is not available - like in example + multicore-mode. * Core-Admin is now also fluid-width aware Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) (!) As Erick pointed out .. Chrome's XML-Capabilities are a bit odd, so it does not render Raw-XML-Data (like we're using for displaying the Schema and Config-File) -- instead it looks like this: http://files.mathe.is/solr-admin/00_chrome-xml.png ; so it would be really nice, to see the [xinclude-Interface|http://files.mathe.is/solr-admin/xinclude/] there :) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned SOLR-2571: - Assignee: Robert Muir IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045079#comment-13045079 ] Michael McCandless commented on SOLR-2564: -- Hmmm. Was this with or without caching? Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045082#comment-13045082 ] James Dyer commented on SOLR-2462: -- I added spellcheck.maxCollationEvaluations to the wiki. Thanks, Robert for taking time helping get this fixed! Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Assignee: Robert Muir Priority: Critical Fix For: 3.3, 4.0 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045087#comment-13045087 ] Yonik Seeley commented on SOLR-2564: This was without caching to put them on an even footing (and given that the base query was all docs, caching would be slower anyway). The URL above was the actual one used to test. Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045094#comment-13045094 ] Ryan McKinley commented on SOLR-2399: - trying to apply this patch, i get: the chunk size did not match the number of added /removed lines! any ideas? Did you make this patch differently then before? Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1736) DateTools.java general improvements
[ https://issues.apache.org/jira/browse/LUCENE-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-1736: - Attachment: LUCENE-1736_DateTools_improvements.patch This is an updated patch. * The former DateFormats class was used as a value in a ThreadLocal which isn't a good idea as it hampers class reloading. * Improvements to a switch statement to benefit from fall-through. * Removed a pointless conversion to Calendar in timeToString() * Moved functionality to Resolution enum, and used arrays of Resolutions indexed by format length instead of large if-else or switch blocks for format parse. The ramification is 48 fewer lines of code. DateTools.java general improvements --- Key: LUCENE-1736 URL: https://issues.apache.org/jira/browse/LUCENE-1736 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.9 Reporter: David Smiley Priority: Minor Fix For: 4.0 Attachments: LUCENE-1736_DateTools_improvements.patch, cleanerDateTools.patch Applying the attached patch shows the improvements to DateTools.java that I think should be done. All logic that does anything at all is moved to instance methods of the inner class Resolution. I argue this is more object-oriented. 1. In cases where Resolution is an argument to the method, I can simply invoke the appropriate call on the Resolution object. Formerly there was a big branch if/else. 2. Instead of synchronized being used seemingly everywhere, synchronized is used to sync on the object that is not threadsafe, be it a DateFormat or Calendar instance. 3. Since different DateFormat and Calendar instances are created per-Resolution, there is now less lock contention since threads using different resolutions will not use the same locks. 4. The old implementation of timeToString rounded the time before formatting it. That's unnecessary since the format only includes the resolution desired. 5. round() now uses a switch statement that benefits from fall-through (no break). Another debatable improvement that could be made is putting the resolution instances into an array indexed by format length. This would mean I could remove the switch in lookupResolutionByLength() and avoid the length constants there. Maybe that would be a bit too over-engineered when the switch is fine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045147#comment-13045147 ] Stefan Matheis (steffkes) commented on SOLR-2399: - bq. any ideas? Did you make this patch differently then before? hm, not really :/ same commands as the patches before. applying it locally, works as expected: {code}$ patch -p0 SOLR-2399-110606.patch patching file solr/src/webapp/web/tpl/schema-browser.html patching file solr/src/webapp/web/tpl/dataimport.html patching file solr/src/webapp/web/tpl/cores.html patching file solr/src/webapp/web/css/screen.css patching file solr/src/webapp/web/css/syntax.css patching file solr/src/webapp/web/js/script.js{code} will have a look on this tomorrow, sorry ryan Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2136) Function Queries: if() function
[ https://issues.apache.org/jira/browse/SOLR-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045148#comment-13045148 ] Jan Høydahl commented on SOLR-2136: --- Great Yonik! Is it possible to have exists() work on multi valued fields too without crashing? Function Queries: if() function --- Key: SOLR-2136 URL: https://issues.apache.org/jira/browse/SOLR-2136 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4.1 Reporter: Jan Høydahl Fix For: 4.0 Attachments: SOLR-2136.patch, SOLR-2136.patch Add an if() function which will enable conditional function queries. The function could be modeled after a spreadsheet if function (e.g: http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_IF_function) IF(test; value1; value2) where: test is or refers to a logical value or expression that returns a logical value (TRUE or FALSE). value1 is the value that is returned by the function if test yields TRUE. value2 is the value that is returned by the function if test yields FALSE. If value2 is omitted it is assumed to be FALSE; if value1 is also omitted it is assumed to be TRUE. Example use: if(color==red; 100; if(color==green; 50; 25)) This function will check the document field color, and if it is red return 100, if it is green return 50, else return 25. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [jira] [Resolved] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format
Hi David, Just curious about your use of the HathiTrust list. I usually explain to people that it's customized to our index and they are probably better off making their own list based on the lists of stop words appropriate for the languages in their index (sources listed in the blog post http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance) If you already have an index built and are re-indexing with CommonGrams , you can also use the -t flag with HighFreqTerms.java in lucene contrib to determine the words that have the largest position lists and are therefore candidates to be added to your CommonGrams word list. We recently ran HighFreqTerms.java against our indexes and discovered that it would be better to remove some of the less frequent foreign language stopwords and instead use some very frequent words from the index. Tom Burton-West www.hathitrust.org/blogs From: Steven Rowe (JIRA) [j...@apache.org] Sent: Monday, June 06, 2011 2:08 PM To: dev@lucene.apache.org Subject: [jira] [Resolved] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format [ https://issues.apache.org/jira/browse/SOLR-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved SOLR-1844. --- Resolution: Won't Fix Assignee: Steven Rowe Thanks David. CommonGramsQueryFilterFactory should read words in a comma-delimited format --- Key: SOLR-1844 URL: https://issues.apache.org/jira/browse/SOLR-1844 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4 Reporter: David Smiley Assignee: Steven Rowe Priority: Minor CommonGramsQueryFilterFactory expects that the file(s) given to the words argument is a carriage-return delimited list of words. It doesn't support comments either. This file format should be more flexible to support comma delimited values. I came across this because I was trying to use the sample file provided by HathiTrust: http://www.hathitrust.org/node/180(named in a file new400common.txt) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-1736) DateTools.java general improvements
[ https://issues.apache.org/jira/browse/LUCENE-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe reassigned LUCENE-1736: --- Assignee: Steven Rowe DateTools.java general improvements --- Key: LUCENE-1736 URL: https://issues.apache.org/jira/browse/LUCENE-1736 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.9 Reporter: David Smiley Assignee: Steven Rowe Priority: Minor Fix For: 4.0 Attachments: LUCENE-1736_DateTools_improvements.patch, cleanerDateTools.patch Applying the attached patch shows the improvements to DateTools.java that I think should be done. All logic that does anything at all is moved to instance methods of the inner class Resolution. I argue this is more object-oriented. 1. In cases where Resolution is an argument to the method, I can simply invoke the appropriate call on the Resolution object. Formerly there was a big branch if/else. 2. Instead of synchronized being used seemingly everywhere, synchronized is used to sync on the object that is not threadsafe, be it a DateFormat or Calendar instance. 3. Since different DateFormat and Calendar instances are created per-Resolution, there is now less lock contention since threads using different resolutions will not use the same locks. 4. The old implementation of timeToString rounded the time before formatting it. That's unnecessary since the format only includes the resolution desired. 5. round() now uses a switch statement that benefits from fall-through (no break). Another debatable improvement that could be made is putting the resolution instances into an array indexed by format length. This would mean I could remove the switch in lookupResolutionByLength() and avoid the length constants there. Maybe that would be a bit too over-engineered when the switch is fine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1736) DateTools.java general improvements
[ https://issues.apache.org/jira/browse/LUCENE-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-1736: Attachment: LUCENE-1736.patch David, this is your patch with a CHANGES.txt entry and a couple of comments added (for javadocs next to the two imports that are javadocs-only; and formatLen spelled out over the shared format string). Nice improvements. All tests pass. I plan on committing shortly. DateTools.java general improvements --- Key: LUCENE-1736 URL: https://issues.apache.org/jira/browse/LUCENE-1736 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.9 Reporter: David Smiley Assignee: Steven Rowe Priority: Minor Fix For: 4.0 Attachments: LUCENE-1736.patch, LUCENE-1736_DateTools_improvements.patch, cleanerDateTools.patch Applying the attached patch shows the improvements to DateTools.java that I think should be done. All logic that does anything at all is moved to instance methods of the inner class Resolution. I argue this is more object-oriented. 1. In cases where Resolution is an argument to the method, I can simply invoke the appropriate call on the Resolution object. Formerly there was a big branch if/else. 2. Instead of synchronized being used seemingly everywhere, synchronized is used to sync on the object that is not threadsafe, be it a DateFormat or Calendar instance. 3. Since different DateFormat and Calendar instances are created per-Resolution, there is now less lock contention since threads using different resolutions will not use the same locks. 4. The old implementation of timeToString rounded the time before formatting it. That's unnecessary since the format only includes the resolution desired. 5. round() now uses a switch statement that benefits from fall-through (no break). Another debatable improvement that could be made is putting the resolution instances into an array indexed by format length. This would mean I could remove the switch in lookupResolutionByLength() and avoid the length constants there. Maybe that would be a bit too over-engineered when the switch is fine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1736) DateTools.java general improvements
[ https://issues.apache.org/jira/browse/LUCENE-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved LUCENE-1736. - Resolution: Fixed Fix Version/s: 3.3 Committed: - r1132806: trunk - r1132812: branch_3x Thanks David! DateTools.java general improvements --- Key: LUCENE-1736 URL: https://issues.apache.org/jira/browse/LUCENE-1736 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.9 Reporter: David Smiley Assignee: Steven Rowe Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-1736.patch, LUCENE-1736_DateTools_improvements.patch, cleanerDateTools.patch Applying the attached patch shows the improvements to DateTools.java that I think should be done. All logic that does anything at all is moved to instance methods of the inner class Resolution. I argue this is more object-oriented. 1. In cases where Resolution is an argument to the method, I can simply invoke the appropriate call on the Resolution object. Formerly there was a big branch if/else. 2. Instead of synchronized being used seemingly everywhere, synchronized is used to sync on the object that is not threadsafe, be it a DateFormat or Calendar instance. 3. Since different DateFormat and Calendar instances are created per-Resolution, there is now less lock contention since threads using different resolutions will not use the same locks. 4. The old implementation of timeToString rounded the time before formatting it. That's unnecessary since the format only includes the resolution desired. 5. round() now uses a switch statement that benefits from fall-through (no break). Another debatable improvement that could be made is putting the resolution instances into an array indexed by format length. This would mean I could remove the switch in lookupResolutionByLength() and avoid the length constants there. Maybe that would be a bit too over-engineered when the switch is fine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045192#comment-13045192 ] Ryan McKinley commented on SOLR-2399: - Ok, i tried on linux and it applied OK. TortiseSVN sometimes barfs when it shouldnt. committed in revision: 1132826 Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045194#comment-13045194 ] Ryan McKinley commented on SOLR-2399: - few minor comments... * on http://localhost:8983/solr/#/singlecore/schema-browser/field/text in the Top 10/405 Terms: with the more/less links. I'm not sure adding 10 at a time is really useful. I would rather click 'more' and get all 50, and have 'less' just go back to 10. * Navigation to Schema Browser works great -- thanks again, this is great stuff. thanks! Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2136) Function Queries: if() function
[ https://issues.apache.org/jira/browse/SOLR-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045230#comment-13045230 ] Yonik Seeley commented on SOLR-2136: bq. Is it possible to have exists() work on multi valued fields too without crashing? Not currently... but note that exists() works on subqueries too, not just fields. So a slow way to do it would be {code} ...exists(query($qq))qq=myfield:[* TO *] {code} Or a faster workaround could be to index a special EXISTS token or EMPTY token and do {code} ...exists(query($qq))qq=myfield:EXISTS {code} See the test code in TestFunctionQuery for an easy way to use pseudo-fields to test this stuff. Function Queries: if() function --- Key: SOLR-2136 URL: https://issues.apache.org/jira/browse/SOLR-2136 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4.1 Reporter: Jan Høydahl Fix For: 4.0 Attachments: SOLR-2136.patch, SOLR-2136.patch Add an if() function which will enable conditional function queries. The function could be modeled after a spreadsheet if function (e.g: http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_IF_function) IF(test; value1; value2) where: test is or refers to a logical value or expression that returns a logical value (TRUE or FALSE). value1 is the value that is returned by the function if test yields TRUE. value2 is the value that is returned by the function if test yields FALSE. If value2 is omitted it is assumed to be FALSE; if value1 is also omitted it is assumed to be TRUE. Example use: if(color==red; 100; if(color==green; 50; 25)) This function will check the document field color, and if it is red return 100, if it is green return 50, else return 25. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2571. --- Resolution: Fixed Committed revision 1132855 (trunk). I organized the constants in DirectSolrSpellchecker a bit, so its easy to see which ones are 'shared' with the others and which ones are unique to it. Committed revision 1132856 (branch_3x). I backported the test and example here. In the case of this test, it needed to clearIndex() in setup() like trunk does, so I merged these bits also. Thanks James! IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2491) spellcheck.maxCollationTries breaks when using FieldCollapsing
[ https://issues.apache.org/jira/browse/SOLR-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045248#comment-13045248 ] Robert Muir commented on SOLR-2491: --- James: any opinion on this with regards to SOLR-2564? I'm totally lost when it comes to grouping, but do you still think collation should use ungrouped queries or should we wait on SOLR-2564, which seems to suggest it can return this count... I could be confused here and haven't looked in detail though. spellcheck.maxCollationTries breaks when using FieldCollapsing -- Key: SOLR-2491 URL: https://issues.apache.org/jira/browse/SOLR-2491 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2491.patch If specifying spellcheck.maxCollationTries and group=true on the same query, you never get any Spell Check Collations back. The problem is that SpellCheckCollator relies on ResponseBuilder.getToLog().get(hits) to see how many results each test query returns. When group=true, the toLog isn't populated so SpellCheckCollator is unable to find a collation that can return results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3176) TestNRTThreads test failure
[ https://issues.apache.org/jira/browse/LUCENE-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045283#comment-13045283 ] Simon Willnauer commented on LUCENE-3176: - phew! This seems like a delete issue. I only looked at the output robert posted so far but it seems that a FrozenDelPackage gets lost somewhere here I will look after buzzwords TestNRTThreads test failure --- Key: LUCENE-3176 URL: https://issues.apache.org/jira/browse/LUCENE-3176 Project: Lucene - Java Issue Type: Bug Environment: trunk Reporter: Robert Muir Assignee: Michael McCandless hit a fail in TestNRTThreads running tests over and over: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org