[jira] [Updated] (SOLR-5399) Improve DebugComponent for distributed requests
[ https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-5399: Attachment: SOLR-5399.patch Added some unit tests. Also, I'm including for now the complete shard response in the track section. > Improve DebugComponent for distributed requests > --- > > Key: SOLR-5399 > URL: https://issues.apache.org/jira/browse/SOLR-5399 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.0 >Reporter: Tomás Fernández Löbbe > Attachments: SOLR-5399.patch, SOLR-5399.patch > > > I'm working on extending the DebugComponent for adding some useful > information to be able to track distributed requests better. I'm adding two > different things, first, the request can generate a "request ID" that will be > printed in the logs for the main query and all the different internal > requests to the different shards. This should make it easier to find the > different parts of a single user request in the logs. It would also add the > "purpose" of each internal request to the logs, like: > RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. > Also, I'm adding a "track" section to the debug info where to add information > about the different phases of the distributed request (right now, I'm only > including QTime, but could eventually include more information) like: > {code:xml} > > > > QTime: 10 > QTime: 25 > > > QTime: 1 > > > > {code} > To get this, debugQuery must be set to true, or debug must include > "debug=track". This information is only added to distributed requests. I > would like to get feedback on this. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Estimating peak memory use for UnInvertedField faceting
Hi Tom, I believe Solr will automatically use DocValues for faceting if you've defined them in the schema. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Mon, Nov 11, 2013 at 11:33 AM, Tom Burton-West wrote: > Thanks Otis, > > I'm looking forward to the presentation videos. > > I'll look into using DocValues.Re-indexing 200 million docs will take a > while though :). > Will Solr automatically use DocValues for faceting if you have DocValues for > the field or is there some configuration or parameter that needs to be set? > > Tom > > > On Sat, Nov 9, 2013 at 9:57 AM, Otis Gospodnetic > wrote: >> >> Hi Tom, >> >> Check http://blog.sematext.com/2013/11/09/presentation-solr-for-analytics/ >> . It includes info about our experiment with DocValues, which clearly >> shows lower heap usage, which means you'll get further without getting >> this OOM. In our experiments we didn't sort, facet, or group, and I >> see you are faceting, which means that DocValues, which are more >> efficient than FieldCache, should help you even more than it helped >> us. >> >> The graphs are from SPM, which you could use to monitor your Solr >> cluster, at least while you are tuning it. >> >> Otis >> -- >> Performance Monitoring * Log Analytics * Search Analytics >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West >> wrote: >> > Hi Yonik, >> > >> > I don't know enough about JVM tuning and monitoring to do this in a >> > clean >> > way, so I just tried setting the max heap at 8GB and then 6GB to force >> > garbage collection. With it set to 6GB it goes into a long GC loop and >> > then runs out of heap (See below) . The stack trace says the issue is >> > with >> > DocTErmOrds.uninvert: >> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded >> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405) >> > >> > I'm guessing the actual peak is somewhere between 6 and 8 GB. >> > >> > BTW: is there some documentation somewhere that explains what the stats >> > output to INFO mean? >> > >> > Tom >> > >> > >> > java.lang.OutOfMemoryError: GC overhead limit exceeded> > name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC >> > overhead limit exceeded >> > at >> > >> > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653) >> > at >> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366) >> > at >> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) >> > at >> > >> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) >> > at >> > >> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) >> > at >> > >> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) >> > at >> > >> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) >> > at >> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548) >> > at >> > >> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) >> > at >> > >> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) >> > at >> > >> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) >> > at >> > >> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) >> > at >> > >> > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) >> > at >> > >> > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) >> > at >> > >> > org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) >> > at >> > >> > org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) >> > at >> > >> > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) >> > at java.lang.Thread.run(Thread.java:724) >> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded >> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405) >> > at >> > org.apache.solr.request.UnInvertedField.(UnInvertedField.java:179) >> > at >> > >> > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664) >> > at >> > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426) >> > at >> > >> > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517) >> > at >> > >> > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252) >> > at >> > >> > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78) >> > at >> > >> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) >> > at >> > >> > org.apache.solr.handler.Reques
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819813#comment-13819813 ] Robert Muir commented on LUCENE-2899: - Just some thoughts: I think it would be best to split out the different functionality here into subtasks for each piece, and figure out how each should best be integrated. The current patch does strange things to try to deal with some impedence mismatch due to the design here, such as the tokenfilter which consumes the entire analysis chain and then replays the whole thing back with POS or NER as payloads. Is it really necessary to give this thing more scope than a single setnence? typically such tagging models (at least the ones ive worked with) tend to be trained only within sentence scope. Also payloads should not be used internally, instead things like TypeAttribute should be used for POSTags, if someone wants to filter out certain POS or maintain certain POS they can use already existing stuff like TypeTokenFilter, if they want to index Type as a payload, they can use TypeAsPayloadTokenFilter, and so on. While I can see this POS-tagging being useful inside the analysis chain: the NER case is much less clear, I think its more important to e.g. be integrated outside of the analysis chain so that named entities/mentions can be faceted on, added to separate fields for search (likely with a different analysis chain for that), etc. So for lucene that would be an easier way to add these as facets, for solr it probably makes more sense as UpdateProcessor than as analysis chain. Finally: I'm confused as to what benefit we get from using OpenNLP directly, versus integrating with it via opennlp-uima? Our UIMA integration at various levels (analysis chain/update processor) is already there, so I'm just wondering if thats a much shorter way path. > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 4.6 > > Attachments: LUCENE-2899-RJN.patch, LUCENE-2899.patch, > OpenNLPFilter.java, OpenNLPTokenizer.java > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819791#comment-13819791 ] Robert Muir commented on LUCENE-2899: - Hi Markus: I haven't looked at this patch. I'll review it now and give my thoughts. > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 4.6 > > Attachments: LUCENE-2899-RJN.patch, LUCENE-2899.patch, > OpenNLPFilter.java, OpenNLPTokenizer.java > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5212. - Resolution: Done The fix is committed to openjdk trunk. > java 7u40 causes sigsegv and corrupt term vectors > - > > Key: LUCENE-5212 > URL: https://issues.apache.org/jira/browse/LUCENE-5212 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: crashFaster.patch, crashFaster2.0.patch, > hs_err_pid32714.log, jenkins.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3397) Insure that Replication and Solr Cloud are compatible
[ https://issues.apache.org/jira/browse/SOLR-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819769#comment-13819769 ] ASF subversion and git services commented on SOLR-3397: --- Commit 1540930 from [~erickoerickson] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1540930 ] SOLR-3397: Insure that replication and SolrCloud are compatible. Actually, just log a warning if SolrCloud is detected and master or slave is configured in solrconfig.xml > Insure that Replication and Solr Cloud are compatible > - > > Key: SOLR-3397 > URL: https://issues.apache.org/jira/browse/SOLR-3397 > Project: Solr > Issue Type: Improvement > Components: replication (java), SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Erick Erickson >Assignee: Erick Erickson > Fix For: 4.6, 5.0 > > Attachments: SOLR-3397.patch > > > There has been at least one report of an early-adopter having replication (as > in master/slave) configured with SolrCloud and having very odd results. > Experienced Solr users could reasonably try this (or just have their > configurations from 3.x Solr installations hanging around). Since SolrCloud > takes this functionality over completely, it seems like replication needs to > be made smart enough to disable itself if running under SolrCloud. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3397) Insure that Replication and Solr Cloud are compatible
[ https://issues.apache.org/jira/browse/SOLR-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-3397. -- Resolution: Fixed Fix Version/s: 5.0 4.6 > Insure that Replication and Solr Cloud are compatible > - > > Key: SOLR-3397 > URL: https://issues.apache.org/jira/browse/SOLR-3397 > Project: Solr > Issue Type: Improvement > Components: replication (java), SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Erick Erickson >Assignee: Erick Erickson > Fix For: 4.6, 5.0 > > Attachments: SOLR-3397.patch > > > There has been at least one report of an early-adopter having replication (as > in master/slave) configured with SolrCloud and having very odd results. > Experienced Solr users could reasonably try this (or just have their > configurations from 3.x Solr installations hanging around). Since SolrCloud > takes this functionality over completely, it seems like replication needs to > be made smart enough to disable itself if running under SolrCloud. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.8.0-ea-b114) - Build # 8180 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/8180/ Java: 32bit/jdk1.8.0-ea-b114 -server -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT Error Message: expected:<3> but was:<2> Stack Trace: java.lang.AssertionError: expected:<3> but was:<2> at __randomizedtesting.SeedInfo.seed([C99468FACBB041CE:7C12097D7471F33A]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133) at org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementR
[jira] [Commented] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819746#comment-13819746 ] ASF subversion and git services commented on SOLR-5408: --- Commit 1540922 from [~joel.bernstein] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1540922 ] SOLR-5408 Fix CollapsingQParserPlugin issue with compound sort criteria > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > Attachments: SOLR-5408.patch, SOLR-5408.patch > > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-5408: --- Attachment: SOLR-5408.patch Here's a test case with Joel's fix merged in too. > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > Attachments: SOLR-5408.patch, SOLR-5408.patch > > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819677#comment-13819677 ] ASF subversion and git services commented on SOLR-5408: --- Commit 1540904 from [~joel.bernstein] in branch 'dev/trunk' [ https://svn.apache.org/r1540904 ] SOLR-5408 Fix CollapsingQParserPlugin issue with compound sort criteria > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > Attachments: SOLR-5408.patch > > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3397) Insure that Replication and Solr Cloud are compatible
[ https://issues.apache.org/jira/browse/SOLR-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3397: - Attachment: SOLR-3397.patch Patch that logs a warning if master or slave is configured and a zkController is detected. > Insure that Replication and Solr Cloud are compatible > - > > Key: SOLR-3397 > URL: https://issues.apache.org/jira/browse/SOLR-3397 > Project: Solr > Issue Type: Improvement > Components: replication (java), SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-3397.patch > > > There has been at least one report of an early-adopter having replication (as > in master/slave) configured with SolrCloud and having very odd results. > Experienced Solr users could reasonably try this (or just have their > configurations from 3.x Solr installations hanging around). Since SolrCloud > takes this functionality over completely, it seems like replication needs to > be made smart enough to disable itself if running under SolrCloud. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3397) Insure that Replication and Solr Cloud are compatible
[ https://issues.apache.org/jira/browse/SOLR-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819589#comment-13819589 ] ASF subversion and git services commented on SOLR-3397: --- Commit 1540881 from [~erickoerickson] in branch 'dev/trunk' [ https://svn.apache.org/r1540881 ] SOLR-3397: Insure that replication and SolrCloud are compatible. Actually, just log a warning if SolrCloud is detected and master or slave is configured in solrconfig.xml > Insure that Replication and Solr Cloud are compatible > - > > Key: SOLR-3397 > URL: https://issues.apache.org/jira/browse/SOLR-3397 > Project: Solr > Issue Type: Improvement > Components: replication (java), SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Erick Erickson >Assignee: Erick Erickson > > There has been at least one report of an early-adopter having replication (as > in master/slave) configured with SolrCloud and having very odd results. > Experienced Solr users could reasonably try this (or just have their > configurations from 3.x Solr installations hanging around). Since SolrCloud > takes this functionality over completely, it seems like replication needs to > be made smart enough to disable itself if running under SolrCloud. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819543#comment-13819543 ] Joel Bernstein commented on SOLR-5408: -- Brandon, I believe this patch should resolve the issue. It was created on branch_4x. If it doesn't apply to your build, let me know and I'll create a patch for the version you're working with. The problem was that the scorer needed to be set on the delegate collecter after each segment reader was set. The initial code was setting the scorer on the delegate collector only once, which worked fine for single sort critera. Joel > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > Attachments: SOLR-5408.patch > > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819545#comment-13819545 ] Joel Bernstein commented on SOLR-5408: -- I'll add a test case for this as well going forward. > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > Attachments: SOLR-5408.patch > > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5408: - Attachment: SOLR-5408.patch > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > Attachments: SOLR-5408.patch > > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819524#comment-13819524 ] Joel Bernstein commented on SOLR-5408: -- I was able to reproduce and am investigating what the issue is. > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5408) Collapsing Query Parser does not respect multiple Sort fields
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein reassigned SOLR-5408: Assignee: Joel Bernstein > Collapsing Query Parser does not respect multiple Sort fields > - > > Key: SOLR-5408 > URL: https://issues.apache.org/jira/browse/SOLR-5408 > Project: Solr > Issue Type: Bug >Affects Versions: 4.5 >Reporter: Brandon Chapman >Assignee: Joel Bernstein >Priority: Critical > > When using the collapsing query parser, only the last sort field appears to > be used. > http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20desc&qf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_eng&pf2=name_eng^2&defType=edismax&rows=12&pf=name_eng~5^3&start=0&q=ipad&boost=sqrt(popularity)&qt=/select_eng&fq=productType:MERCHANDISE&fq=merchant:bestbuycanada&fq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)&fq=translations:eng&fl=psid,name_eng,score&debug=true&debugQuery=true&fq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} > > > 3002010250210 > > ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) > > 0.41423172 > > The same query without using the collapsing query parser produces the > expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5336) Add a simple QueryParser to parse human-entered queries.
[ https://issues.apache.org/jira/browse/LUCENE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Conradson updated LUCENE-5336: --- Attachment: LUCENE-5336.patch Attached an updated version of the patch with the three modifications from my previous comment. > Add a simple QueryParser to parse human-entered queries. > > > Key: LUCENE-5336 > URL: https://issues.apache.org/jira/browse/LUCENE-5336 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jack Conradson > Attachments: LUCENE-5336.patch, LUCENE-5336.patch > > > I would like to add a new simple QueryParser to Lucene that is designed to > parse human-entered queries. This parser will operate on an entire entered > query using a specified single field or a set of weighted fields (using term > boost). > All features/operations in this parser can be enabled or disabled depending > on what is necessary for the user. A default operator may be specified as > either 'MUST' representing 'and' or 'SHOULD' representing 'or.' The > features/operations that this parser will include are the following: > * AND specified as '+' > * OR specified as '|' > * NOT specified as '-' > * PHRASE surrounded by double quotes > * PREFIX specified as '*' > * PRECEDENCE surrounded by '(' and ')' > * WHITESPACE specified as ' ' '\n' '\r' and '\t' will cause the default > operator to be used > * ESCAPE specified as '\' will allow operators to be used in terms > The key differences between this parser and other existing parsers will be > the following: > * No exceptions will be thrown, and errors in syntax will be ignored. The > parser will do a best-effort interpretation of any query entered. > * It uses minimal syntax to express queries. All available operators are > single characters or pairs of single characters. > * The parser is hand-written and in a single Java file making it easy to > modify. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5322) Clean up / simplify Maven-related Ant targets
[ https://issues.apache.org/jira/browse/LUCENE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819424#comment-13819424 ] ASF subversion and git services commented on LUCENE-5322: - Commit 1540849 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1540849 ] LUCENE-5322: 'ant validate-maven-dependencies' doesn't need to call 'filter-pom-templates' directly, since 'generate-maven-artifacts' already does it > Clean up / simplify Maven-related Ant targets > - > > Key: LUCENE-5322 > URL: https://issues.apache.org/jira/browse/LUCENE-5322 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Minor > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5322.patch, > LUCENE-5322.validate-maven-artifacts.patch > > > Many Maven-related Ant targets are public when they don't need to be, e.g. > dist-maven and filter-pom-templates, m2-deploy-lucene-parent-pom, etc. > The arrangement of these targets could be simplified if the directories that > have public entry points were minimized. > generate-maven-artifacts should be runnable from the top level and from > lucene/ and solr/. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5322) Clean up / simplify Maven-related Ant targets
[ https://issues.apache.org/jira/browse/LUCENE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819393#comment-13819393 ] ASF subversion and git services commented on LUCENE-5322: - Commit 1540846 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1540846 ] LUCENE-5322: 'ant validate-maven-artifacts' should depend on 'generate-maven-artifacts' > Clean up / simplify Maven-related Ant targets > - > > Key: LUCENE-5322 > URL: https://issues.apache.org/jira/browse/LUCENE-5322 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Minor > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5322.patch, > LUCENE-5322.validate-maven-artifacts.patch > > > Many Maven-related Ant targets are public when they don't need to be, e.g. > dist-maven and filter-pom-templates, m2-deploy-lucene-parent-pom, etc. > The arrangement of these targets could be simplified if the directories that > have public entry points were minimized. > generate-maven-artifacts should be runnable from the top level and from > lucene/ and solr/. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1023: POMs out of sync
Looks like I jumped the gun in assuming for my latest LUCENE-5322 commit that ‘ant validate-maven-artifacts’ doesn't need to depend on ‘generate-maven-artifacts’ - this problem didn’t surface for me locally since I don’t remove Lucene/Solr artifacts from my local Maven repository before running the target. I’ll put back the ‘generate-maven-artifacts’ dependency. From the Jenkins log: —— artifact:dependencies] Unable to locate resource in repository [artifact:dependencies] [INFO] Unable to find resource 'org.apache.lucene:lucene-core:jar:5.0-SNAPSHOT' in repository sonatype.releases ( http://oss.sonatype.org/content/repositories/releases ) [artifact:dependencies] An error has occurred while processing the Maven artifact tasks. [artifact:dependencies] Diagnosis: [artifact:dependencies] [artifact:dependencies] Unable to resolve artifact: Missing: [artifact:dependencies] -- [artifact:dependencies] 1) org.apache.lucene:lucene-codecs:jar:5.0-SNAPSHOT … [artifact:dependencies] 2) org.apache.lucene:lucene-core:jar:5.0-SNAPSHOT —— On Nov 11, 2013, at 3:38 PM, Apache Jenkins Server wrote: > Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1023/ > > No tests ran. > > Build Log: > [...truncated 1227 lines...] > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1023: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1023/ No tests ran. Build Log: [...truncated 1227 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5337) Add Payload support to FileDictionary (Suggest) and make it more configurable
[ https://issues.apache.org/jira/browse/LUCENE-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-5337: --- Attachment: LUCENE-5337.patch Comment-only change so precommit succeeds (Javadoc). So this cleanly precommits and tests on my machine. Unless there are objections, I'll commit this Wednesday or so, after 4.6 is tagged, I don't see a good reason to rush this into the 4.6 release. > Add Payload support to FileDictionary (Suggest) and make it more configurable > - > > Key: LUCENE-5337 > URL: https://issues.apache.org/jira/browse/LUCENE-5337 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5337.patch, LUCENE-5337.patch, LUCENE-5337.patch > > > It would be nice to add payload support to FileDictionary, so user can pass > in associated payload with suggestion entries. > Currently the FileDictionary has a hard-coded field-delimiter (TAB), it would > be nice to let the users configure the field delimiter as well. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5322) Clean up / simplify Maven-related Ant targets
[ https://issues.apache.org/jira/browse/LUCENE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819332#comment-13819332 ] ASF subversion and git services commented on LUCENE-5322: - Commit 1540832 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1540832 ] LUCENE-5322: make 'ant validate-maven-artifacts' run faster > Clean up / simplify Maven-related Ant targets > - > > Key: LUCENE-5322 > URL: https://issues.apache.org/jira/browse/LUCENE-5322 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Minor > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5322.patch, > LUCENE-5322.validate-maven-artifacts.patch > > > Many Maven-related Ant targets are public when they don't need to be, e.g. > dist-maven and filter-pom-templates, m2-deploy-lucene-parent-pom, etc. > The arrangement of these targets could be simplified if the directories that > have public entry points were minimized. > generate-maven-artifacts should be runnable from the top level and from > lucene/ and solr/. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5322) Clean up / simplify Maven-related Ant targets
[ https://issues.apache.org/jira/browse/LUCENE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819327#comment-13819327 ] Steve Rowe edited comment on LUCENE-5322 at 11/11/13 8:19 PM: -- Currently {{validate-maven-artifacts}} invokes {{filter-pom-templates}} once per POM, which is way too much; also, {{validate-maven-artifacts}} depends on {{generate-maven-artifacts}}, even though it only needs the filtered POMs, and not the built artifacts. This patch fixes both issues. Committing shortly. was (Author: steve_rowe): Currently {{validate-maven-artifacts}} invokes {{filter-pom-templates}} once per POM, which is way too much; also, {{validate-maven-artifacts}} depends on {{generate-maven-artifacts}}, even though it only needs to filtered POMs, and not the built artifacts. This patch fixes both issues. Committing shortly. > Clean up / simplify Maven-related Ant targets > - > > Key: LUCENE-5322 > URL: https://issues.apache.org/jira/browse/LUCENE-5322 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Minor > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5322.patch, > LUCENE-5322.validate-maven-artifacts.patch > > > Many Maven-related Ant targets are public when they don't need to be, e.g. > dist-maven and filter-pom-templates, m2-deploy-lucene-parent-pom, etc. > The arrangement of these targets could be simplified if the directories that > have public entry points were minimized. > generate-maven-artifacts should be runnable from the top level and from > lucene/ and solr/. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5322) Clean up / simplify Maven-related Ant targets
[ https://issues.apache.org/jira/browse/LUCENE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-5322: --- Attachment: LUCENE-5322.validate-maven-artifacts.patch Currently {{validate-maven-artifacts}} invokes {{filter-pom-templates}} once per POM, which is way too much; also, {{validate-maven-artifacts}} depends on {{generate-maven-artifacts}}, even though it only needs to filtered POMs, and not the built artifacts. This patch fixes both issues. Committing shortly. > Clean up / simplify Maven-related Ant targets > - > > Key: LUCENE-5322 > URL: https://issues.apache.org/jira/browse/LUCENE-5322 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Minor > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5322.patch, > LUCENE-5322.validate-maven-artifacts.patch > > > Many Maven-related Ant targets are public when they don't need to be, e.g. > dist-maven and filter-pom-templates, m2-deploy-lucene-parent-pom, etc. > The arrangement of these targets could be simplified if the directories that > have public entry points were minimized. > generate-maven-artifacts should be runnable from the top level and from > lucene/ and solr/. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5336) Add a simple QueryParser to parse human-entered queries.
[ https://issues.apache.org/jira/browse/LUCENE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819275#comment-13819275 ] Jack Conradson commented on LUCENE-5336: Thanks for the feedback. To answer the malformed input question -- If "foo bar is given as the query, the double quote will be dropped, and if whitespace is an operator it will make term queries for both 'foo' and 'bar' otherwise it will make a single term query 'foo bar' If foo"bar is given as the query, the double quote will be dropped, and term queries will be made for both 'foo' and 'bar' The reason it's done this way is because the parser only backtracks as far as the malformed input (in this case the extraneous double quote), so 'foo' would already be part of the query tree. This is because only a single pass is made for each query. The parser could be changed to do two passes to remove extraneous characters, but I believe that only makes the code more complex, and doesn't necessarily interpret the query any better for a user since the malformed character gives no hint as to what he/she really intended to do. I will try to post another patch today or tomorrow. I plan to do the following: * Fix the Javadoc comment * Add more tests for random operators * Rename the class to SimpleQueryParser and rename the package to .simple > Add a simple QueryParser to parse human-entered queries. > > > Key: LUCENE-5336 > URL: https://issues.apache.org/jira/browse/LUCENE-5336 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jack Conradson > Attachments: LUCENE-5336.patch > > > I would like to add a new simple QueryParser to Lucene that is designed to > parse human-entered queries. This parser will operate on an entire entered > query using a specified single field or a set of weighted fields (using term > boost). > All features/operations in this parser can be enabled or disabled depending > on what is necessary for the user. A default operator may be specified as > either 'MUST' representing 'and' or 'SHOULD' representing 'or.' The > features/operations that this parser will include are the following: > * AND specified as '+' > * OR specified as '|' > * NOT specified as '-' > * PHRASE surrounded by double quotes > * PREFIX specified as '*' > * PRECEDENCE surrounded by '(' and ')' > * WHITESPACE specified as ' ' '\n' '\r' and '\t' will cause the default > operator to be used > * ESCAPE specified as '\' will allow operators to be used in terms > The key differences between this parser and other existing parsers will be > the following: > * No exceptions will be thrown, and errors in syntax will be ignored. The > parser will do a best-effort interpretation of any query entered. > * It uses minimal syntax to express queries. All available operators are > single characters or pairs of single characters. > * The parser is hand-written and in a single Java file making it easy to > modify. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5416) CollapsingQParserPlugin bug with Tagging
[ https://issues.apache.org/jira/browse/SOLR-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819255#comment-13819255 ] Joel Bernstein commented on SOLR-5416: -- Yes, if all the documents in the same group have the same facet counts then you won't notice this problem. > CollapsingQParserPlugin bug with Tagging > > > Key: SOLR-5416 > URL: https://issues.apache.org/jira/browse/SOLR-5416 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 4.6 >Reporter: David >Assignee: Joel Bernstein > Labels: group, grouping > Fix For: 4.6, 5.0 > > Attachments: SOLR-5416.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Trying to use CollapsingQParserPlugin with facet tagging throws an exception. > {code} > ModifiableSolrParams params = new ModifiableSolrParams(); > params.add("q", "*:*"); > params.add("fq", "{!collapse field=group_s}"); > params.add("defType", "edismax"); > params.add("bf", "field(test_ti)"); > params.add("fq","{!tag=test_ti}test_ti:5"); > params.add("facet","true"); > params.add("facet.field","{!ex=test_ti}test_ti"); > assertQ(req(params), "*[count(//doc)=1]", > "//doc[./int[@name='test_ti']='5']"); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5416) CollapsingQParserPlugin bug with Tagging
[ https://issues.apache.org/jira/browse/SOLR-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819237#comment-13819237 ] David commented on SOLR-5416: - Oh I see that won't actually be a problem for me since all of the documents in the group should have the same facet counts. Thanks for the reply. I will wait for a fix. But for now, if I understand your reply correctly, I don't think that will affect my facet counts in a negative manner. > CollapsingQParserPlugin bug with Tagging > > > Key: SOLR-5416 > URL: https://issues.apache.org/jira/browse/SOLR-5416 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 4.6 >Reporter: David >Assignee: Joel Bernstein > Labels: group, grouping > Fix For: 4.6, 5.0 > > Attachments: SOLR-5416.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Trying to use CollapsingQParserPlugin with facet tagging throws an exception. > {code} > ModifiableSolrParams params = new ModifiableSolrParams(); > params.add("q", "*:*"); > params.add("fq", "{!collapse field=group_s}"); > params.add("defType", "edismax"); > params.add("bf", "field(test_ti)"); > params.add("fq","{!tag=test_ti}test_ti:5"); > params.add("facet","true"); > params.add("facet.field","{!ex=test_ti}test_ti"); > assertQ(req(params), "*[count(//doc)=1]", > "//doc[./int[@name='test_ti']='5']"); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5416) CollapsingQParserPlugin bug with Tagging
[ https://issues.apache.org/jira/browse/SOLR-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819224#comment-13819224 ] Joel Bernstein commented on SOLR-5416: -- David, Here is the flow as I see it with the patch: 1) The intitial search executes and produces a result based on collapsing the groups by score. 2) The facet component needs to regenerate the docset because of the tag/exclude parameters. But the scorer is not present when regenerating the docset, so it is using logic that overwrites the group-head each time. This will result in the document that is found latest in the index becoming the group-head, for each group. So, the result set used to calculate the facet, will be different from the result set used to generate the search results. To keep these in sync you would need to have step 2, also collapse based on score. > CollapsingQParserPlugin bug with Tagging > > > Key: SOLR-5416 > URL: https://issues.apache.org/jira/browse/SOLR-5416 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 4.6 >Reporter: David >Assignee: Joel Bernstein > Labels: group, grouping > Fix For: 4.6, 5.0 > > Attachments: SOLR-5416.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Trying to use CollapsingQParserPlugin with facet tagging throws an exception. > {code} > ModifiableSolrParams params = new ModifiableSolrParams(); > params.add("q", "*:*"); > params.add("fq", "{!collapse field=group_s}"); > params.add("defType", "edismax"); > params.add("bf", "field(test_ti)"); > params.add("fq","{!tag=test_ti}test_ti:5"); > params.add("facet","true"); > params.add("facet.field","{!ex=test_ti}test_ti"); > assertQ(req(params), "*[count(//doc)=1]", > "//doc[./int[@name='test_ti']='5']"); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5337) Add Payload support to FileDictionary (Suggest) and make it more configurable
[ https://issues.apache.org/jira/browse/LUCENE-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5337: - Attachment: LUCENE-5337.patch Updated Patch: - minor changes to fix forbidden api checks and documentation lint Thanks Erick and Robert for the review. I updated the patch so that it will pass the validations. I still have the diamond operators in the test code, let me know if there is anything I can do about that. > Add Payload support to FileDictionary (Suggest) and make it more configurable > - > > Key: LUCENE-5337 > URL: https://issues.apache.org/jira/browse/LUCENE-5337 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5337.patch, LUCENE-5337.patch > > > It would be nice to add payload support to FileDictionary, so user can pass > in associated payload with suggestion entries. > Currently the FileDictionary has a hard-coded field-delimiter (TAB), it would > be nice to let the users configure the field delimiter as well. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5416) CollapsingQParserPlugin bug with Tagging
[ https://issues.apache.org/jira/browse/SOLR-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819180#comment-13819180 ] David commented on SOLR-5416: - Joel, I'm currently using my patch and the facet counts are correct and the performance is good. We were looking to roll this out in production where I work. Would you advise against it? What kind of problems could this cause? > CollapsingQParserPlugin bug with Tagging > > > Key: SOLR-5416 > URL: https://issues.apache.org/jira/browse/SOLR-5416 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 4.6 >Reporter: David >Assignee: Joel Bernstein > Labels: group, grouping > Fix For: 4.6, 5.0 > > Attachments: SOLR-5416.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Trying to use CollapsingQParserPlugin with facet tagging throws an exception. > {code} > ModifiableSolrParams params = new ModifiableSolrParams(); > params.add("q", "*:*"); > params.add("fq", "{!collapse field=group_s}"); > params.add("defType", "edismax"); > params.add("bf", "field(test_ti)"); > params.add("fq","{!tag=test_ti}test_ti:5"); > params.add("facet","true"); > params.add("facet.field","{!ex=test_ti}test_ti"); > assertQ(req(params), "*[count(//doc)=1]", > "//doc[./int[@name='test_ti']='5']"); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Estimating peak memory use for UnInvertedField faceting
Thanks Otis, I'm looking forward to the presentation videos. I'll look into using DocValues.Re-indexing 200 million docs will take a while though :). Will Solr automatically use DocValues for faceting if you have DocValues for the field or is there some configuration or parameter that needs to be set? Tom On Sat, Nov 9, 2013 at 9:57 AM, Otis Gospodnetic wrote: > Hi Tom, > > Check http://blog.sematext.com/2013/11/09/presentation-solr-for-analytics/ > . It includes info about our experiment with DocValues, which clearly > shows lower heap usage, which means you'll get further without getting > this OOM. In our experiments we didn't sort, facet, or group, and I > see you are faceting, which means that DocValues, which are more > efficient than FieldCache, should help you even more than it helped > us. > > The graphs are from SPM, which you could use to monitor your Solr > cluster, at least while you are tuning it. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West > wrote: > > Hi Yonik, > > > > I don't know enough about JVM tuning and monitoring to do this in a clean > > way, so I just tried setting the max heap at 8GB and then 6GB to force > > garbage collection. With it set to 6GB it goes into a long GC loop and > > then runs out of heap (See below) . The stack trace says the issue is > with > > DocTErmOrds.uninvert: > > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405) > > > > I'm guessing the actual peak is somewhere between 6 and 8 GB. > > > > BTW: is there some documentation somewhere that explains what the stats > > output to INFO mean? > > > > Tom > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC > > overhead limit exceeded > > at > > > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) > > at > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) > > at > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) > > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548) > > at > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > > at > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) > > at > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) > > at > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) > > at > > > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) > > at > > > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) > > at > > > org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) > > at > > > org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) > > at > > > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) > > at java.lang.Thread.run(Thread.java:724) > > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405) > > at > org.apache.solr.request.UnInvertedField.(UnInvertedField.java:179) > > at > > > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664) > > at > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426) > > at > > > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517) > > at > > > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252) > > at > > > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78) > > at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) > > ... 16 more > > > > > > --- > > Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField > > INFO: UnInverted multi-valued field {field=topicStr, > > memSize=1,768,10
[jira] [Commented] (LUCENE-5337) Add Payload support to FileDictionary (Suggest) and make it more configurable
[ https://issues.apache.org/jira/browse/LUCENE-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819038#comment-13819038 ] Erick Erickson commented on LUCENE-5337: Robert: OK, I didn't realize that was the case, I was guessing that there'd be some kind of cutover point, probably where we decided to move development pretty much to trunk and start backporting fewer JIRAs... That said, things like diamonds etc. are trivial, I'm quite willing to do those. More complicated things and I'll be willing to do on a case-by-case basis, depending probably on how adventurous I'm feeling at the time and how complex the code rearrangement looks. Erick > Add Payload support to FileDictionary (Suggest) and make it more configurable > - > > Key: LUCENE-5337 > URL: https://issues.apache.org/jira/browse/LUCENE-5337 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5337.patch > > > It would be nice to add payload support to FileDictionary, so user can pass > in associated payload with suggestion entries. > Currently the FileDictionary has a hard-coded field-delimiter (TAB), it would > be nice to let the users configure the field delimiter as well. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5427) SolrCloud leaking (many) filehandles to deleted files
[ https://issues.apache.org/jira/browse/SOLR-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Bus closed SOLR-5427. -- Resolution: Not A Problem This problem seems to be related to running SOLR inside a Tomcat server. I switched to the bundled Jetty, and the problems are gone. No open files after running the server for about 2 days. Normally, the first open file handles would appear in a few minutes or hours. > SolrCloud leaking (many) filehandles to deleted files > - > > Key: SOLR-5427 > URL: https://issues.apache.org/jira/browse/SOLR-5427 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.3, 4.4, 4.5 > Environment: Debian Linux 6.0 running on VMWare > Tomcat 6 >Reporter: Eric Bus > > I'm running SolrCloud on three nodes. I've been experiencing strange problems > on these nodes. The main problem is that my disk is filling up, because old > tlog files are not being released by SOLR. > I suspect this problem is caused by a lot of open connectins between the > nodes in CLOSE_WAIT status. After running a node for only 2 days, the node > already has 33 connections and about 11.000 deleted files that are still open. > I'm running about 100 cores on each nodes. Could this be causing the rate in > which things are going wrong? I suspect that on a setup with only 1 > collection and 3 shards, the problem stays hidden for quite some time. > lsof -p 15452 -n | grep -i tcp | grep CLOSE_WAIT > java15452 root 45u IPv6 706925770t0 TCP > 11.1.0.12:46533->11.1.0.13:http-alt (CLOSE_WAIT) > java15452 root 48u IPv6 706925790t0 TCP > 11.1.0.12:46535->11.1.0.13:http-alt (CLOSE_WAIT) > java15452 root 205u IPv6 727594340t0 TCP > 11.1.0.12:41744->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root 378u IPv6 723591150t0 TCP > 11.1.0.12:44767->11.1.0.11:http-alt (CLOSE_WAIT) > java15452 root 381u IPv6 723591160t0 TCP > 11.1.0.12:44768->11.1.0.11:http-alt (CLOSE_WAIT) > java15452 root 5252u IPv6 727594450t0 TCP > 11.1.0.12:41751->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root 6193u IPv6 740216510t0 TCP > 11.1.0.12:39170->11.1.0.11:http-alt (CLOSE_WAIT) > java15452 root *150u IPv6 740216480t0 TCP > 11.1.0.12:53865->11.1.0.13:http-alt (CLOSE_WAIT) > java15452 root *152u IPv6 727594240t0 TCP > 11.1.0.12:41737->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *526u IPv6 740279950t0 TCP > 11.1.0.12:53965->11.1.0.13:http-alt (CLOSE_WAIT) > java15452 root *986u IPv6 727686370t0 TCP > 11.1.0.12:42246->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *626u IPv6 727499830t0 TCP > 11.1.0.12:41297->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *476u IPv6 727686330t0 TCP > 11.1.0.12:42243->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *567u IPv6 727686220t0 TCP > 11.1.0.12:42234->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *732u IPv6 727685990t0 TCP > 11.1.0.12:42230->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *799u IPv6 727594270t0 TCP > 11.1.0.12:41739->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *259u IPv6 727686260t0 TCP > 11.1.0.12:42237->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *272u IPv6 727689970t0 TCP > 11.1.0.12:42263->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *493u IPv6 727594070t0 TCP > 11.1.0.12:41729->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *693u IPv6 740209090t0 TCP > 11.1.0.12:53853->11.1.0.13:http-alt (CLOSE_WAIT) > java15452 root *740u IPv6 727499960t0 TCP > 11.1.0.12:41306->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *749u IPv6 739752300t0 TCP > 11.1.0.12:38825->11.1.0.11:http-alt (CLOSE_WAIT) > java15452 root *750u IPv6 739746190t0 TCP > 11.1.0.12:53499->11.1.0.13:http-alt (CLOSE_WAIT) > java15452 root *771u IPv6 727594200t0 TCP > 11.1.0.12:41734->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *793u IPv6 727686530t0 TCP > 11.1.0.12:42256->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *900u IPv6 727686180t0 TCP > 11.1.0.12:42233->11.1.0.12:http-alt (CLOSE_WAIT) > java15452 root *045u IPv6 727664770t0 TCP > 11.1.0.12:41181->11.1.0.11:http-alt (CLOSE_WAIT) > jav
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #503: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/503/ No tests ran. Build Log: [...truncated 23667 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5205) MoreLikeThis doesn't escape shard queries
[ https://issues.apache.org/jira/browse/SOLR-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819006#comment-13819006 ] Markus Jelsma commented on SOLR-5205: - Shawn, i hadn't noticed bad performance for distributed MLT before until just now. It looks like that in a 5 shard cluster it fires about 12 queries of which most are really slow. Doing MLT on a not distributed node with a large amount of documents is lightning fast! Many of the queries fired are distrib=true. > MoreLikeThis doesn't escape shard queries > - > > Key: SOLR-5205 > URL: https://issues.apache.org/jira/browse/SOLR-5205 > Project: Solr > Issue Type: Bug > Components: MoreLikeThis >Affects Versions: 4.4 >Reporter: Markus Jelsma > Fix For: 4.6 > > Attachments: SOLR-5205-trunk.patch, SOLR-5205.patch > > > MoreLikeThis does not support Lucene special characters as ID in distributed > search. ID's containing special characters such as URL's need to be escaped > in the first place. They are then unescaped and get sent to shards in an > unescaped form, causing the org.apache.solr.search.SyntaxError exception. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5337) Add Payload support to FileDictionary (Suggest) and make it more configurable
[ https://issues.apache.org/jira/browse/LUCENE-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818982#comment-13818982 ] Robert Muir commented on LUCENE-5337: - {quote} 1> It won't compile in 4x since it uses some Java 7 constructs, I stopped at the "diamond" bit. Unless this is intended for trunk only, could you fix these? {quote} Erick: FYI trunk is on java7. So java7 syntax is actually welcome there and good to use. Its the committers job to remove such syntax when/if backporting to some branch that doesnt support java7. > Add Payload support to FileDictionary (Suggest) and make it more configurable > - > > Key: LUCENE-5337 > URL: https://issues.apache.org/jira/browse/LUCENE-5337 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5337.patch > > > It would be nice to add payload support to FileDictionary, so user can pass > in associated payload with suggestion entries. > Currently the FileDictionary has a hard-coded field-delimiter (TAB), it would > be nice to let the users configure the field delimiter as well. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module
[ https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818976#comment-13818976 ] Markus Jelsma commented on LUCENE-4753: --- I usually update both but perhaps i didn't this time or i didn't get all commits, the latter sometimes happens and then i have to update twice. It's fixed now. > Make forbidden API checks per-module > > > Key: LUCENE-4753 > URL: https://issues.apache.org/jira/browse/LUCENE-4753 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Critical > Fix For: 4.6 > > Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch > > > After the forbidden API checker was released separately from Lucene as a > Google Code project (forked and improved), including Maven support, the > checks on Lucene should be changed to work per-module. > The reason for this is: The improved checker is more picky about e.g. > extending classes that are forbidden or overriding methods and calling > super.method() if they are on the forbidden signatures list. For these > checks, it is not enough to have the class files and the rt.jar, you need the > whole classpath. The forbidden APIs 1.0 now by default complains if classes > are missing from the classpath. > It is very hard with the module architecture of Lucene/Solr, to make a > uber-classpath, instead the checks should be done per module, so the default > compile/test classpath of the module can be used and no crazy path statements > with **/*.jar are needed. This needs some refactoring in the exclusion lists, > but the Lucene checks could be done by a macro in common-build, that allows > custom exclusion lists for specific modules. > Currently, the "strict" checking is disabled for Solr, so the checker only > complains about missing classes but does not fail the build: > {noformat} > -check-forbidden-java-apis: > [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6 > [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6 > [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1 > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt > [forbidden-apis] Loading classes to check... > [forbidden-apis] Scanning for API signatures and dependencies... > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix > the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. > Please fix the classpath! > [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden > API invocations (in 1.80s), 0 error(s). > {noformat} > I added almost all missing jars, but those do not seem to be in the solr part > of the source tree (i think they are only copied when building artifacts). > With making the whole thing per module, we can use the default classpath of > the module which makes it much easier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module
[ https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818970#comment-13818970 ] Uwe Schindler commented on LUCENE-4753: --- It is still strange, because the whole change was one single commit. So you would have either nothing or all of the changes. From your error message, it looks as you may have only updated the lucene folder and not solr. Because this old target was only existent in lucene/common-build.xml; if this file was updated, solr would no longer find it with the old name. > Make forbidden API checks per-module > > > Key: LUCENE-4753 > URL: https://issues.apache.org/jira/browse/LUCENE-4753 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Critical > Fix For: 4.6 > > Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch > > > After the forbidden API checker was released separately from Lucene as a > Google Code project (forked and improved), including Maven support, the > checks on Lucene should be changed to work per-module. > The reason for this is: The improved checker is more picky about e.g. > extending classes that are forbidden or overriding methods and calling > super.method() if they are on the forbidden signatures list. For these > checks, it is not enough to have the class files and the rt.jar, you need the > whole classpath. The forbidden APIs 1.0 now by default complains if classes > are missing from the classpath. > It is very hard with the module architecture of Lucene/Solr, to make a > uber-classpath, instead the checks should be done per module, so the default > compile/test classpath of the module can be used and no crazy path statements > with **/*.jar are needed. This needs some refactoring in the exclusion lists, > but the Lucene checks could be done by a macro in common-build, that allows > custom exclusion lists for specific modules. > Currently, the "strict" checking is disabled for Solr, so the checker only > complains about missing classes but does not fail the build: > {noformat} > -check-forbidden-java-apis: > [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6 > [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6 > [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1 > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt > [forbidden-apis] Loading classes to check... > [forbidden-apis] Scanning for API signatures and dependencies... > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix > the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. > Please fix the classpath! > [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden > API invocations (in 1.80s), 0 error(s). > {noformat} > I added almost all missing jars, but those do not seem to be in the solr part > of the source tree (i think they are only copied when building artifacts). > With making the whole thing per module, we can use the default classpath of > the module which makes it much easier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5337) Add Payload support to FileDictionary (Suggest) and make it more configurable
[ https://issues.apache.org/jira/browse/LUCENE-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818969#comment-13818969 ] Erick Erickson commented on LUCENE-5337: Areek: There are a couple of problems with this patch... 1> It won't compile in 4x since it uses some Java 7 constructs, I stopped at the "diamond" bit. Unless this is intended for trunk only, could you fix these? 2> on trunk, running "ant precommit" shows the following errors. These are pretty easy to fix, just takes specifying the UTF-8 charset as I remember. They're all in the test code, but still [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7 [forbidden-apis] Reading API signatures: /Users/Erick/apache/trunk_5337/lucene/tools/forbiddenApis/base.txt [forbidden-apis] Loading classes to check... [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.lucene.search.suggest.FileDictionaryTest (FileDictionaryTest.java:76) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.lucene.search.suggest.FileDictionaryTest (FileDictionaryTest.java:98) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.lucene.search.suggest.FileDictionaryTest (FileDictionaryTest.java:120) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.lucene.search.suggest.FileDictionaryTest (FileDictionaryTest.java:146) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.lucene.search.suggest.FileDictionaryTest (FileDictionaryTest.java:173) [forbidden-apis] Scanned 179 (and 405 related) class file(s) for forbidden API invocations (in 0.10s), 5 error(s). I can take care of the secretarial stuff here and get this committed. I glanced over the code but don't know the area deeply to make any deeper comments, anyone want to chime in on that score? > Add Payload support to FileDictionary (Suggest) and make it more configurable > - > > Key: LUCENE-5337 > URL: https://issues.apache.org/jira/browse/LUCENE-5337 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5337.patch > > > It would be nice to add payload support to FileDictionary, so user can pass > in associated payload with suggestion entries. > Currently the FileDictionary has a hard-coded field-delimiter (TAB), it would > be nice to let the users configure the field delimiter as well. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5338) Let classifiers filter unlabeled documents during training
[ https://issues.apache.org/jira/browse/LUCENE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili resolved LUCENE-5338. - Resolution: Fixed > Let classifiers filter unlabeled documents during training > -- > > Key: LUCENE-5338 > URL: https://issues.apache.org/jira/browse/LUCENE-5338 > Project: Lucene - Core > Issue Type: Bug > Components: modules/classification >Affects Versions: 4.5 >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.6, 5.0 > > > Only labeled (with 'class' field) documents should be used during training > and therefore each classifier should filter (and not fail when run against) > them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5338) Let classifiers filter unlabeled documents during training
[ https://issues.apache.org/jira/browse/LUCENE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818946#comment-13818946 ] ASF subversion and git services commented on LUCENE-5338: - Commit 1540706 from [~teofili] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1540706 ] LUCENE-5338 - backport to branch_4x > Let classifiers filter unlabeled documents during training > -- > > Key: LUCENE-5338 > URL: https://issues.apache.org/jira/browse/LUCENE-5338 > Project: Lucene - Core > Issue Type: Bug > Components: modules/classification >Affects Versions: 4.5 >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.6, 5.0 > > > Only labeled (with 'class' field) documents should be used during training > and therefore each classifier should filter (and not fail when run against) > them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module
[ https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818947#comment-13818947 ] Markus Jelsma commented on LUCENE-4753: --- ant example from /solr. I svn upped my checkout not long ago and got no updated build.xml. I upped again and i finally received your commit. Svn must be behind. Thanks > Make forbidden API checks per-module > > > Key: LUCENE-4753 > URL: https://issues.apache.org/jira/browse/LUCENE-4753 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Critical > Fix For: 4.6 > > Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch > > > After the forbidden API checker was released separately from Lucene as a > Google Code project (forked and improved), including Maven support, the > checks on Lucene should be changed to work per-module. > The reason for this is: The improved checker is more picky about e.g. > extending classes that are forbidden or overriding methods and calling > super.method() if they are on the forbidden signatures list. For these > checks, it is not enough to have the class files and the rt.jar, you need the > whole classpath. The forbidden APIs 1.0 now by default complains if classes > are missing from the classpath. > It is very hard with the module architecture of Lucene/Solr, to make a > uber-classpath, instead the checks should be done per module, so the default > compile/test classpath of the module can be used and no crazy path statements > with **/*.jar are needed. This needs some refactoring in the exclusion lists, > but the Lucene checks could be done by a macro in common-build, that allows > custom exclusion lists for specific modules. > Currently, the "strict" checking is disabled for Solr, so the checker only > complains about missing classes but does not fail the build: > {noformat} > -check-forbidden-java-apis: > [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6 > [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6 > [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1 > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt > [forbidden-apis] Loading classes to check... > [forbidden-apis] Scanning for API signatures and dependencies... > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix > the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. > Please fix the classpath! > [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden > API invocations (in 1.80s), 0 error(s). > {noformat} > I added almost all missing jars, but those do not seem to be in the solr part > of the source tree (i think they are only copied when building artifacts). > With making the whole thing per module, we can use the default classpath of > the module which makes it much easier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5338) Let classifiers filter unlabeled documents during training
[ https://issues.apache.org/jira/browse/LUCENE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818938#comment-13818938 ] ASF subversion and git services commented on LUCENE-5338: - Commit 1540703 from [~teofili] in branch 'dev/trunk' [ https://svn.apache.org/r1540703 ] LUCENE-5338 - avoid considering unlabeled documents for training > Let classifiers filter unlabeled documents during training > -- > > Key: LUCENE-5338 > URL: https://issues.apache.org/jira/browse/LUCENE-5338 > Project: Lucene - Core > Issue Type: Bug > Components: modules/classification >Affects Versions: 4.5 >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.6, 5.0 > > > Only labeled (with 'class' field) documents should be used during training > and therefore each classifier should filter (and not fail when run against) > them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module
[ https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818933#comment-13818933 ] Uwe Schindler commented on LUCENE-4753: --- Hi, which build target did you call from where? This outdated target "install-forbidden-apis" no longer exists (it was renamed). I looks like you have a checkout with mixed svn revisions or you have changed some build.xml files yourself and they conflicted. Be sure to: # revert all changes (make sure you save your changes in a diff before doing this) # update your checkout from the root folder (where lucene, dev-tools, and solr subfolders are visible). Updating only solr or lucene subfolder leads to inconsistency as dependencies inside ANT no longer work # if nothing helps, use a fresh checkout and try again. You can apply the patch from step 1 to restore your changes. Jenkins already verified that everything is fine. I cannot find any problems, too: I can call "ant compile/test/check-forbidden-apis/..." from everywhere and it works. > Make forbidden API checks per-module > > > Key: LUCENE-4753 > URL: https://issues.apache.org/jira/browse/LUCENE-4753 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Critical > Fix For: 4.6 > > Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch > > > After the forbidden API checker was released separately from Lucene as a > Google Code project (forked and improved), including Maven support, the > checks on Lucene should be changed to work per-module. > The reason for this is: The improved checker is more picky about e.g. > extending classes that are forbidden or overriding methods and calling > super.method() if they are on the forbidden signatures list. For these > checks, it is not enough to have the class files and the rt.jar, you need the > whole classpath. The forbidden APIs 1.0 now by default complains if classes > are missing from the classpath. > It is very hard with the module architecture of Lucene/Solr, to make a > uber-classpath, instead the checks should be done per module, so the default > compile/test classpath of the module can be used and no crazy path statements > with **/*.jar are needed. This needs some refactoring in the exclusion lists, > but the Lucene checks could be done by a macro in common-build, that allows > custom exclusion lists for specific modules. > Currently, the "strict" checking is disabled for Solr, so the checker only > complains about missing classes but does not fail the build: > {noformat} > -check-forbidden-java-apis: > [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6 > [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6 > [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1 > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt > [forbidden-apis] Loading classes to check... > [forbidden-apis] Scanning for API signatures and dependencies... > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix > the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. > Please fix the classpath! > [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden > API invocations (in 1.80s), 0 error(s). > {noformat} > I added almost all missing jars, but those do not seem to be in the solr part > of the source tree (i think they are only copied when building artifacts). > With making the whole thing per module, we can use the default classpath of > the module which makes it much easier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module
[ https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818926#comment-13818926 ] Markus Jelsma commented on LUCENE-4753: --- Hi Uwe, i can't compile anymore since something was changed to build.xml {code} BUILD FAILED Target "install-forbidden-apis" does not exist in the project "solr". It is used from target "check-forbidden-apis". Total time: 0 seconds {code} > Make forbidden API checks per-module > > > Key: LUCENE-4753 > URL: https://issues.apache.org/jira/browse/LUCENE-4753 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Critical > Fix For: 4.6 > > Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch > > > After the forbidden API checker was released separately from Lucene as a > Google Code project (forked and improved), including Maven support, the > checks on Lucene should be changed to work per-module. > The reason for this is: The improved checker is more picky about e.g. > extending classes that are forbidden or overriding methods and calling > super.method() if they are on the forbidden signatures list. For these > checks, it is not enough to have the class files and the rt.jar, you need the > whole classpath. The forbidden APIs 1.0 now by default complains if classes > are missing from the classpath. > It is very hard with the module architecture of Lucene/Solr, to make a > uber-classpath, instead the checks should be done per module, so the default > compile/test classpath of the module can be used and no crazy path statements > with **/*.jar are needed. This needs some refactoring in the exclusion lists, > but the Lucene checks could be done by a macro in common-build, that allows > custom exclusion lists for specific modules. > Currently, the "strict" checking is disabled for Solr, so the checker only > complains about missing classes but does not fail the build: > {noformat} > -check-forbidden-java-apis: > [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6 > [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6 > [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1 > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt > [forbidden-apis] Reading API signatures: C:\Users\Uwe > Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt > [forbidden-apis] Loading classes to check... > [forbidden-apis] Scanning for API signatures and dependencies... > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. > Please fix the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix > the classpath! > [forbidden-apis] WARNING: The referenced class > 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. > Please fix the classpath! > [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden > API invocations (in 1.80s), 0 error(s). > {noformat} > I added almost all missing jars, but those do not seem to be in the solr part > of the source tree (i think they are only copied when building artifacts). > With making the whole thing per module, we can use the default classpath of > the module which makes it much easier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr Release Management Process
Hi; I've resolved 2 issues last week. One of them is created by me and one of them was an existence issue. Also there is an 3rd issue that is a duplication of the second one. When I create an issue I have a right to edit Fix Version/s. I've written 4.6 for fix version of first issue. Second issue was not created by me so I can not edit the Fix Version/s. I just wonder and want to learn commitment process of Solr project. What committers do before a new release process start? If they filter the resolved issues that has a Fix Version/s of new release they will not able to see resolved issues. If they filter the issues resolved since the last release then they are not using the benefits of Fix Version/s section. People have a right to edit Fix Version/s section when they create an issue but does not have a right to edit existence one (ones are created by other people) There are many issues at Solr project and frequent commits every day. Should I point the user at comments (with an @ tag) for such kind of situations (I follow who is responsible for next release from dev-list) or do you handle it yourself (as like how you handled it since this time). I just wanted to learn the internal process of release management. Thanks; Furkan KAMACI
[jira] [Commented] (LUCENE-5311) Make it possible to train / run classification over multiple fields
[ https://issues.apache.org/jira/browse/LUCENE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818890#comment-13818890 ] ASF subversion and git services commented on LUCENE-5311: - Commit 1540675 from [~teofili] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1540675 ] LUCENE-5311 - backport to branch_4x > Make it possible to train / run classification over multiple fields > --- > > Key: LUCENE-5311 > URL: https://issues.apache.org/jira/browse/LUCENE-5311 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/classification >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.6, 5.0 > > > It'd be nice to be able to use multiple fields instead of just one for > training / running each classifier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5338) Let classifiers filter unlabeled documents during training
[ https://issues.apache.org/jira/browse/LUCENE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated LUCENE-5338: Fix Version/s: 5.0 > Let classifiers filter unlabeled documents during training > -- > > Key: LUCENE-5338 > URL: https://issues.apache.org/jira/browse/LUCENE-5338 > Project: Lucene - Core > Issue Type: Bug > Components: modules/classification >Affects Versions: 4.5 >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.6, 5.0 > > > Only labeled (with 'class' field) documents should be used during training > and therefore each classifier should filter (and not fail when run against) > them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5435) An edismax query wrapped in parentheses parsed wrong
[ https://issues.apache.org/jira/browse/SOLR-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818869#comment-13818869 ] Anssi Törmä commented on SOLR-5435: --- This may be related to SOLR-3377. At least that's where I found the workaround to leave a space between parentheses. > An edismax query wrapped in parentheses parsed wrong > > > Key: SOLR-5435 > URL: https://issues.apache.org/jira/browse/SOLR-5435 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 4.3.1 >Reporter: Anssi Törmä > > I have an edismax query with the following parameters: > * q={{("jenkins " OR text:"jenkins")}} > ** Yes, there is a space in {{"jenkins "}} > * qf={{used_name^7 text}} > Queries to the field {{used_name}} are analyzed like this > {noformat} > > > pattern="(,|\s)+" >replacement=" "/> > > > > {noformat} > Queries to the field {{text}} are anayzed like this: > {noformat} > > >generateWordParts="0" > generateNumberParts="0" > catenateWords="1" > catenateNumbers="0" > catenateAll="0" > preserveOriginal="1"/> > > > > {noformat} > In Solr admin console, I can see the query is parsed wrongly: > {{+((used_name:jenkins^7.0 | text:jenkins) (used_name:text:^7.0 | (text:text: > text:text)) (used_name:jenkins^7.0 | text:jenkins))}} > See that {{(text:text: text:text)}}? > As a workaround I leave a space between parentheses and what they enclose, > i.e. q={{( "jenkins " OR text:"jenkins" )}}, then the query is parsed as I > expect, i.e. > {{+((used_name:jenkins^7.0 | text:jenkins) text:jenkins)}} > The query is also parsed correctly if there's no space in {{"jenkins"}}. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5435) An edismax query wrapped in parentheses parsed wrong
Anssi Törmä created SOLR-5435: - Summary: An edismax query wrapped in parentheses parsed wrong Key: SOLR-5435 URL: https://issues.apache.org/jira/browse/SOLR-5435 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.3.1 Reporter: Anssi Törmä I have an edismax query with the following parameters: * q={{("jenkins " OR text:"jenkins")}} ** Yes, there is a space in {{"jenkins "}} * qf={{used_name^7 text}} Queries to the field {{used_name}} are analyzed like this {noformat} {noformat} Queries to the field {{text}} are anayzed like this: {noformat} {noformat} In Solr admin console, I can see the query is parsed wrongly: {{+((used_name:jenkins^7.0 | text:jenkins) (used_name:text:^7.0 | (text:text: text:text)) (used_name:jenkins^7.0 | text:jenkins))}} See that {{(text:text: text:text)}}? As a workaround I leave a space between parentheses and what they enclose, i.e. q={{( "jenkins " OR text:"jenkins" )}}, then the query is parsed as I expect, i.e. {{+((used_name:jenkins^7.0 | text:jenkins) text:jenkins)}} The query is also parsed correctly if there's no space in {{"jenkins"}}. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices
[ https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818844#comment-13818844 ] Shai Erera commented on LUCENE-5333: Actually, if we move FacetResultsHandler to FacetRequest and create the new AllDimensionsFR, it doesn't need to setDepth() at all, only override createFacetResultsHandler. And we can add a flattenResults() method to AllDimsFR which takes a FacetResult and returns a List, to simplify app's life. Just an idea. > Support sparse faceting for heterogeneous indices > - > > Key: LUCENE-5333 > URL: https://issues.apache.org/jira/browse/LUCENE-5333 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Reporter: Michael McCandless > Attachments: LUCENE-5333.patch, LUCENE-5333.patch, LUCENE-5333.patch > > > In some search apps, e.g. a large e-commerce site, the index can have > a mix of wildly different product categories and facet dimensions, and > the number of dimensions could be huge. > E.g. maybe the index has shirts, computer memory, hard drives, etc., > and each of these many categories has different attributes. > In such an index, when someone searches for "so dimm", which should > match a bunch of laptop memory modules, you can't (easily) know up > front which facet dimensions will be important. > But, I think this is very easy for the facet module, since ords are > stored "row stride" (each doc lists all facet labels it has), we could > simply count all facets that the hits actually saw, and then in the > end see which ones "got traction" and return facet results for these > top dims. > I'm not sure what the API would look like, but conceptually this > should work very well, because of how the facet module works. > You shouldn't have to state up front exactly which facet dimensions > to count... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5338) Let classifiers filter unlabeled documents during training
Tommaso Teofili created LUCENE-5338: --- Summary: Let classifiers filter unlabeled documents during training Key: LUCENE-5338 URL: https://issues.apache.org/jira/browse/LUCENE-5338 Project: Lucene - Core Issue Type: Bug Components: modules/classification Affects Versions: 4.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 4.6 Only labeled (with 'class' field) documents should be used during training and therefore each classifier should filter (and not fail when run against) them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5311) Make it possible to train / run classification over multiple fields
[ https://issues.apache.org/jira/browse/LUCENE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated LUCENE-5311: Fix Version/s: 5.0 4.6 > Make it possible to train / run classification over multiple fields > --- > > Key: LUCENE-5311 > URL: https://issues.apache.org/jira/browse/LUCENE-5311 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/classification >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.6, 5.0 > > > It'd be nice to be able to use multiple fields instead of just one for > training / running each classifier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5311) Make it possible to train / run classification over multiple fields
[ https://issues.apache.org/jira/browse/LUCENE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili resolved LUCENE-5311. - Resolution: Fixed for now resolving for naive bayes and knn, work on perceptron will come in a separate issue > Make it possible to train / run classification over multiple fields > --- > > Key: LUCENE-5311 > URL: https://issues.apache.org/jira/browse/LUCENE-5311 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/classification >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.6, 5.0 > > > It'd be nice to be able to use multiple fields instead of just one for > training / running each classifier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5333) Support sparse faceting for heterogeneous indices
[ https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5333: --- Attachment: LUCENE-5333.patch Patch add AllDimensionsFacetResultsHandler as a quick prototype to how this can be done. I also modified testTaxonomy to use it instead of AllFacetsAccumulator, and it passes. If we want to proceed with this approach, we can do the following: * Add a new AllDimensionsFacetRequest which either: ** Extends CountFacetRequest, but then we limit it to counting only ** Wraps another FacetRequest so that you can do any aggregation that you want. ** It setDepth(2) internally. * Move FacetResultsHandler into FacetRequest, instead of TaxonomyFacetsAccumulator.createFacetResultsHandler. I'll admit that originally that's where it was (in FR), but I moved it to FA in order to simplify FR implementations. But perhaps it does belong w/ FR... The only non-trivial part of this is that you get back a FacetResult, whose children are the actual results, so you cannot simply iterate on res.subResults, but need to realize you should iterate on each subResults.subResults. I don't know if this is considered as complicated or not (I didn't find it very complicating, but maybe I'm biased :)). All-in-all, I think this is somewhat better than the accumulator approach, as it's more intuitive to define a FacetRequest, I think. In the faceted search module, FacetRequest == Query (in the content search jargon), and therefore more user-level than the underlying accumulator. The downside is that it's not automatically supported by SortedSetDVAccumulator, since the latter doesn't respect any FacetRequest, only CountFacetRequest, and also does not let you specify your own FacetResultsHandler, but I think that that's solvable. > Support sparse faceting for heterogeneous indices > - > > Key: LUCENE-5333 > URL: https://issues.apache.org/jira/browse/LUCENE-5333 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Reporter: Michael McCandless > Attachments: LUCENE-5333.patch, LUCENE-5333.patch, LUCENE-5333.patch > > > In some search apps, e.g. a large e-commerce site, the index can have > a mix of wildly different product categories and facet dimensions, and > the number of dimensions could be huge. > E.g. maybe the index has shirts, computer memory, hard drives, etc., > and each of these many categories has different attributes. > In such an index, when someone searches for "so dimm", which should > match a bunch of laptop memory modules, you can't (easily) know up > front which facet dimensions will be important. > But, I think this is very easy for the facet module, since ords are > stored "row stride" (each doc lists all facet labels it has), we could > simply count all facets that the hits actually saw, and then in the > end see which ones "got traction" and return facet results for these > top dims. > I'm not sure what the API would look like, but conceptually this > should work very well, because of how the facet module works. > You shouldn't have to state up front exactly which facet dimensions > to count... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices
[ https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818801#comment-13818801 ] Shai Erera commented on LUCENE-5333: I talked with Gilad about it and he suggested a nice solution, with some limitations -- you can create whatever FacetRequest, e.g. CountFacetRequest over the ROOT category and set its depth to 2. That way, if we ask for numResults=10, you basically say "give me the top-10 dimensions (children of ROOT) and for each its top-10 children". This isn't perfect as if you want to get all available dimensions you have to guess what numResults should be set to. And if you ask for a high number, e.g. 100, you ask for the top-100 children of ROOT, and for each its top-100 children. Still, you might not get all dimensions, but it's a very easy way to do this. No need for any custom code. Another limitation is that this is currently supported by TaxonomyFacetsAccumulator, but SortedSetDVAccumulator limits the depth to 1 for all given requests. In that spirit, I can propose another solution - write a FacetResultsHandler which skips the first level of children and returns a FacetResult which has a tree structure, such that the first level are the dimensions and the second level are the actual children. That way, doing new CountFacetRequest(ROOT, 10).setDepth(2) will result in all available dimensions in the first level, but top-10 for each in the second level. To implement such FacetResultsHandler we'd need to iterate over ROOT's children and compute the top-K for each, using e.g. DepthOneFacetResultsHandler... > Support sparse faceting for heterogeneous indices > - > > Key: LUCENE-5333 > URL: https://issues.apache.org/jira/browse/LUCENE-5333 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Reporter: Michael McCandless > Attachments: LUCENE-5333.patch, LUCENE-5333.patch > > > In some search apps, e.g. a large e-commerce site, the index can have > a mix of wildly different product categories and facet dimensions, and > the number of dimensions could be huge. > E.g. maybe the index has shirts, computer memory, hard drives, etc., > and each of these many categories has different attributes. > In such an index, when someone searches for "so dimm", which should > match a bunch of laptop memory modules, you can't (easily) know up > front which facet dimensions will be important. > But, I think this is very easy for the facet module, since ords are > stored "row stride" (each doc lists all facet labels it has), we could > simply count all facets that the hits actually saw, and then in the > end see which ones "got traction" and return facet results for these > top dims. > I'm not sure what the API would look like, but conceptually this > should work very well, because of how the facet module works. > You shouldn't have to state up front exactly which facet dimensions > to count... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Other MacOSX bug - was: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 999 - Failure!
For what it's worth, this bug has been rare, but does show up from time to time. Forget about reproducing it ;) Dawid On Mon, Nov 11, 2013 at 10:41 AM, Seán Coffey wrote: > Thanks for mail Rory. I actually ran into a similar issue last week which > investigating a JNI/Exception reporting issue on Mac. We don't seem to have > many similar reports in JBS (JIRA) - We may have a mac kernel issue. Will > investigate further and log a bug if needed. > > regards, > Sean. > > > On 09/11/2013 20:05, Rory O'Donnell wrote: >> >> Sean, >> >> Can you take a look at this? >> >> Rgds, Rory >> >> >>> On 9 Nov 2013, at 18:59, Uwe Schindler wrote: >>> >>> Hi Rory, >>> >>> in the last weeks (also on developer's machines) we see this bug on OSX >>> in Java 7: >>> >>> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/999/ >>> >>>[junit4] java(186,0x14ebd5000) malloc: *** error for object >>> 0x14ebc3f90: pointer being freed was not allocated >>>[junit4] *** set a breakpoint in malloc_error_break to debug >>> >>> This happens sometimes, but only on OSX (their malloc/free implementation >>> in the libc seems to be more picky than Windows' or Linux's). The JVM >>> crashes, but produces no hs_err file. I investigated your bug tracker, but >>> all of those issues were closed as "not reproducible": http://goo.gl/dsZBrs >>> >>> Can you reopen them or open a new one? This issue does not reproduce, but >>> seems to be a major problem on OSX: This bug and the other one makes JDK 7 >>> on OX unuseable for server apps (like app servers or Lucene search engine >>> installations), because they crash randomly. >>> >>> Uwe >>> >>> - >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>> -Original Message- From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de] Sent: Saturday, November 09, 2013 1:33 AM To: dev@lucene.apache.org; rjer...@apache.org Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 999 - Failure! Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/999/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC All tests passed Build Log: [...truncated 10462 lines...] [junit4] JVM J0: stderr was not empty, see: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr- core/test/temp/junit4-J0-20131108_235715_812.syserr [junit4] >>> JVM J0: stderr (verbatim) [junit4] java(186,0x14ebd5000) malloc: *** error for object 0x14ebc3f90: pointer being freed was not allocated [junit4] *** set a breakpoint in malloc_error_break to debug [junit4] <<< JVM J0: EOF [...truncated 1 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/ java -XX:+UseCompressedOops -XX:+UseG1GC - XX:+HeapDumpOnOutOfMemoryError - XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=2CD7DE196210E842 - Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false - Dtests.codec=random -Dtests.postingsformat=random - Dtests.docvaluesformat=random -Dtests.locale=random - Dtests.timezone=random -Dtests.directory=random - Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 - Dtests.cleanthreads=perClass - Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false - Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false - Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. - Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/build/solr-core/test/temp - Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/build/clover/db - Djava.security.manager=org.apache.lucene.util.TestSecurityManager - Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT - Djetty.testMode=1 -Djetty.insecurerandom=1 - Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory - Djava.awt.headless=true -Dtests.disableHdfs=true -classpath /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr- core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/build/solr-test- framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/test-framework/lib/junit4-ant- 2.0.13.jar:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene- Solr-trunk-MacOSX/lucene/build/test- framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk- Ma
Re: Other MacOSX bug - was: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 999 - Failure!
Thanks for mail Rory. I actually ran into a similar issue last week which investigating a JNI/Exception reporting issue on Mac. We don't seem to have many similar reports in JBS (JIRA) - We may have a mac kernel issue. Will investigate further and log a bug if needed. regards, Sean. On 09/11/2013 20:05, Rory O'Donnell wrote: Sean, Can you take a look at this? Rgds, Rory On 9 Nov 2013, at 18:59, Uwe Schindler wrote: Hi Rory, in the last weeks (also on developer's machines) we see this bug on OSX in Java 7: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/999/ [junit4] java(186,0x14ebd5000) malloc: *** error for object 0x14ebc3f90: pointer being freed was not allocated [junit4] *** set a breakpoint in malloc_error_break to debug This happens sometimes, but only on OSX (their malloc/free implementation in the libc seems to be more picky than Windows' or Linux's). The JVM crashes, but produces no hs_err file. I investigated your bug tracker, but all of those issues were closed as "not reproducible": http://goo.gl/dsZBrs Can you reopen them or open a new one? This issue does not reproduce, but seems to be a major problem on OSX: This bug and the other one makes JDK 7 on OX unuseable for server apps (like app servers or Lucene search engine installations), because they crash randomly. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de] Sent: Saturday, November 09, 2013 1:33 AM To: dev@lucene.apache.org; rjer...@apache.org Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 999 - Failure! Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/999/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC All tests passed Build Log: [...truncated 10462 lines...] [junit4] JVM J0: stderr was not empty, see: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr- core/test/temp/junit4-J0-20131108_235715_812.syserr [junit4] >>> JVM J0: stderr (verbatim) [junit4] java(186,0x14ebd5000) malloc: *** error for object 0x14ebc3f90: pointer being freed was not allocated [junit4] *** set a breakpoint in malloc_error_break to debug [junit4] <<< JVM J0: EOF [...truncated 1 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/ java -XX:+UseCompressedOops -XX:+UseG1GC - XX:+HeapDumpOnOutOfMemoryError - XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=2CD7DE196210E842 - Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false - Dtests.codec=random -Dtests.postingsformat=random - Dtests.docvaluesformat=random -Dtests.locale=random - Dtests.timezone=random -Dtests.directory=random - Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 - Dtests.cleanthreads=perClass - Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false - Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false - Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. - Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/build/solr-core/test/temp - Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/build/clover/db - Djava.security.manager=org.apache.lucene.util.TestSecurityManager - Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT - Djetty.testMode=1 -Djetty.insecurerandom=1 - Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory - Djava.awt.headless=true -Dtests.disableHdfs=true -classpath /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr- core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/build/solr-test- framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/test-framework/lib/junit4-ant- 2.0.13.jar:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene- Solr-trunk-MacOSX/lucene/build/test- framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucen e-Solr-trunk-MacOSX/solr/build/solr- solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/solr/build/solr- core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0- SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0- SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk- MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0- SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk- Ma
[jira] [Updated] (LUCENE-5337) Add Payload support to FileDictionary (Suggest) and make it more configurable
[ https://issues.apache.org/jira/browse/LUCENE-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5337: - Attachment: LUCENE-5337.patch Initial Patch: - added payload support for FileDictionary - Improved javadocs - made field delimiter configurable - added tests for FileDictionary > Add Payload support to FileDictionary (Suggest) and make it more configurable > - > > Key: LUCENE-5337 > URL: https://issues.apache.org/jira/browse/LUCENE-5337 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5337.patch > > > It would be nice to add payload support to FileDictionary, so user can pass > in associated payload with suggestion entries. > Currently the FileDictionary has a hard-coded field-delimiter (TAB), it would > be nice to let the users configure the field delimiter as well. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org