[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 639 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/639/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean Error Message: IOException occured when talking to server at: https://127.0.0.1:54573/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:54573/solr/collection1 at __randomizedtesting.SeedInfo.seed([37AEFB665E4976:63DCAFF9063A8E54]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146) at org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean(TestBatchUpdate.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 6541 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6541/ Java: 32bit/jdk1.6.0_45 -server -XX:+UseConcMarkSweepGC 2 tests failed. REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxRegistration Error Message: No SolrDynamicMBeans found Stack Trace: java.lang.AssertionError: No SolrDynamicMBeans found at __randomizedtesting.SeedInfo.seed([EA81515636EF4332:6450356C5BAE1B57]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:662) REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxUpdate Error Message: No mbean found for SolrIndexSearcher Stack Trace:
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712195#comment-13712195 ] Alan Woodward commented on SOLR-4478: - Have found another problem here - what to do with core-specific properties? Core properties are passed to the SolrConfig object at construction, so there's no way at present to use a new set of properties with an existing configset. Same with IndexSchema, which re-uses the resource loader from SolrConfig. Allow cores to specify a named config set - Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Contentions observed in lucene execution
Hi All, I need some help in analyzing some contentions we observe in the Lucene execution. We are supporting the Sterling 9.0 fulfillment application and it uses Lucene 2.4 for catalog search functionality. ---The Issue This system is Live in production since Nov 2012 and only recently (Mid June 2013) our application is forming stuck threads during lucene invocations, this causes our application to crash. This occurs 2 - 3 times a week, on other days we see spikes of very slow performance on the exact places that causes stuck threads. ---The research--- We have validated that the data or the usage has not grown between Jan 2012 now. We took snapshot of the code execution (through visual VM) and for slow running treads we validated that too much time is spent at certain spots (these very same spots appear in the stack trace of the stuck threads). ---Help needed--- If you can guide me on what kind of contentions (heap, IO, Data, CPU, JVM params) can cause such a behavior it will really help. ---Lucene Invocation contentions observed--- (We find stuck threads slowness at the following spots, ordered in the order of severity [high to low]) 1. java.io.RandomAccessFile.readBytes(Native Method) java.io.RandomAccessFile.read(RandomAccessFile.java:338) org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596) org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) 2. org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:167) org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:373) org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71) org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351) 3. org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80) org.apache.lucene.index.TermBuffer.read(TermBuffer.java:65) org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127) org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:389) org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71) org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351) 4. java.io.RandomAccessFile.seek(Native Method) org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:591) org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) -- View this message in context: http://lucene.472066.n3.nabble.com/Contentions-observed-in-lucene-execution-tp4078796.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4734) FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
[ https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4734: - Attachment: LUCENE-4734.patch Ryan, I iterated over your patch in order to be able to handle a few more queries, specifically phrase queries that contain gaps or have several terms at the same position. It is very hard to handle all possibilities without making the highlighting complexity explode. I'm looking forward to LUCENE-2878 so that highlighting can be more efficient and doesn't need to duplicate the query interpretation logic anymore. FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight Key: LUCENE-4734 URL: https://issues.apache.org/jira/browse/LUCENE-4734 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0, 4.1, 5.0 Reporter: Ryan Lauck Labels: fastvectorhighlighter, highlighter Fix For: 4.4 Attachments: lucene-4734.patch, LUCENE-4734.patch If a proximity phrase query overlaps with any other query term it will not be highlighted. Example Text: A B C D E F G Example Queries: B E~10 D (D will be highlighted instead of B C D E) B E~10 C F~10 (nothing will be highlighted) This can be traced to the FieldPhraseList constructor's inner while loop. From the first example query, the first TermInfo popped off the stack will be B. The second TermInfo will be D which will not be found in the submap for B E~10 and will trigger a failed match. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-4118) FastVectorHighlighter fail to highlight taking in input some proximity query.
[ https://issues.apache.org/jira/browse/LUCENE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand closed LUCENE-4118. Resolution: Duplicate Duplicate of LUCENE-4734 FastVectorHighlighter fail to highlight taking in input some proximity query. - Key: LUCENE-4118 URL: https://issues.apache.org/jira/browse/LUCENE-4118 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 3.4, 5.0 Reporter: Emanuele Lombardi Assignee: Koji Sekiguchi Labels: FastVectorHighlighter Attachments: FVHPatch.txt There are 2 related bug with proximity query 1) In a phrase there are n repeated terms the FVH module fails to highlight that. see testRepeatedTermsWithSlop 2) If you search the terms reversed the FVH module fails to highlight that. see testReversedTermsWithSlop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712241#comment-13712241 ] Markus Jelsma commented on SOLR-4816: - Joel, it is working perfectly and already runs fine in one production environment. It's about 30% more efficient when sending data from 20 Hadoop reducers to 10 Solr SSD nodes using routing than the current method. We didn't implement the routable deletes - we're still using SolrServer.deleteById(), seems UpdateRequestExt is not going to be the definitive API to talk to, right? I assume it won't make it in 4.4 but we should make an effort to get committed to trunk and/or 4.5 some day soon. Add document routing to CloudSolrServer --- Key: SOLR-4816 URL: https://issues.apache.org/jira/browse/SOLR-4816 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch This issue adds the following enhancements to CloudSolrServer's update logic: 1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server. 2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster. These enhancements should allow for near linear scalability on indexing throughput. Usage: CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); cloudClient.setParallelUpdates(true); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(id, 0); doc1.addField(a_t, hello1); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, 2); doc2.addField(a_t, hello2); UpdateRequest request = new UpdateRequest(); request.add(doc1); request.add(doc2); request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response. //To get more detailed response down cast to RouteResponse: CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
4.0 and 4.1 FieldCacheImpl.DocTermsImpl.exists(docid) possibly broken
Hi, just an FYI - may be helpful for anyone obliged to use 4.0.0 or 4.1.0 - it seems that this method is actually doing the opposite of its intention. I did not find mentions of this in the lists or elsewhere. This is the code for o.a.l.search.FieldCacheImpl.DocTermsImpl.exists(int): public boolean exists(int docID) { return docToOffset.get(docID) == 0; } Its description says: Returns true if this doc has this field and is not deleted. But it returns true for docs not containing the field and false for those that do contain it. A simple workaround is to not to call this method before calling getTerm() but rather just rely on getTerm() logic: ... returns the same BytesRef, or an empty (length=0) BytesRef if the doc did not have this field or was deleted. So usage code can be like this: DocTerms values = FieldCache.DEFAULT.getTerms(reader, FIELD_NAME); BytesRef term = new BytesRef(); for (int docid=0; docidvalues.size(); docid++) { term = values.getTerm(docid, term); if (term.length0) { doSomethingWith(term.utf8ToString()); } } FieldCache.DEFAULT.purge(reader); I am not sure about the overhead of this comparing to first checking exists(), but it at least work correctly. The code for exists() was as above until R1442497 ( http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?revision=1442497view=markup) and then in R1443717 ( http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?r1=1442497r2=1443717diff_format=h) the API was change as part of LUCENE-4547 (DocValues improvements) which was included in 4.2. Simple code to demonstrate this (here with 4.1 but same results with 4.0): RAMDirectory d = new RAMDirectory(); IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_41, new SimpleAnalyzer(Version.LUCENE_41))); w.addDocument(new Document()); // Empty doc (0, 0) Document doc = new Document(); // Real doc (1, 1) doc.add(new StringField(f1, v1, Store.NO)); w.addDocument(doc); w.addDocument(new Document()); // Empty doc (2, 2) w.addDocument(new Document()); // Empty doc (3, 3) w.commit(); // Commit - so we'll have two atomic readers doc = new Document(); // RealDoc (0, 4) doc.add(new StringField(f1, v2, Store.NO)); w.addDocument(doc); w.addDocument(new Document()); // Empty doc (1, 5) w.close(); IndexReader r = DirectoryReader.open(d); BytesRef br = new BytesRef(); for (AtomicReaderContext leaf : r.leaves()) { System.out.println(--- new atomic reader); AtomicReader reader = leaf.reader(); DocTerms a = FieldCache.DEFAULT.getTerms(reader, f1); for (int i = 0; i reader.maxDoc(); ++i) { int n = leaf.docBase + i; System.out.println(n+ exists: +a.exists(i)); br = a.getTerm(i, br); if (br.length 0) { System.out.println(n + +br.utf8ToString()); } } } The result printing: --- new atomic reader 0 exists: true 1 exists: false 1 v1 2 exists: true 3 exists: true --- new atomic reader 4 exists: false 4 v2 5 exists: true Indeed, exists() results are wrong. So again, just an FYI, as this API no longer exists... Regards, Doron
[jira] [Updated] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too
[ https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated LUCENE-5091: Fix Version/s: (was: 4.4) 4.5 Modify SpanNotQuery to act as SpanNotNearQuery too -- Key: LUCENE-5091 URL: https://issues.apache.org/jira/browse/LUCENE-5091 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: 4.3.1 Reporter: Tim Allison Priority: Minor Fix For: 4.5 Attachments: LUCENE-5091.patch.txt With very small modifications, SpanNotQuery can act as a SpanNotNearQuery. To find a but not if b appears 3 tokens before or 4 tokens after a: new SpanNotQuery(a, b, 3, 4) Original constructor still exists and calls SpanNotQuery(a, b, 0, 0). Patch with tests on way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too
[ https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712253#comment-13712253 ] Tim Allison commented on LUCENE-5091: - With the push for 4.4 on, I've moved this to 4.5. If someone has a chance to review this, that'd be great. Thank you! Modify SpanNotQuery to act as SpanNotNearQuery too -- Key: LUCENE-5091 URL: https://issues.apache.org/jira/browse/LUCENE-5091 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: 4.3.1 Reporter: Tim Allison Priority: Minor Fix For: 4.5 Attachments: LUCENE-5091.patch.txt With very small modifications, SpanNotQuery can act as a SpanNotNearQuery. To find a but not if b appears 3 tokens before or 4 tokens after a: new SpanNotQuery(a, b, 3, 4) Original constructor still exists and calls SpanNotQuery(a, b, 0, 0). Patch with tests on way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.comwrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky *From:* Shai Erera ser...@gmail.com *Sent:* Thursday, July 18, 2013 1:36 AM *To:* dev@lucene.apache.org *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712297#comment-13712297 ] Markus Jelsma commented on SOLR-4260: - FYI: we're still seeing major inconsistencies, facet counts are off and when inspecting leaders and replica's we notice not all are in sync. This is on yesterday's trunk and with an empty index. There were no node failures during indexing. Shard_b's stats for example: node 2 shard b {code} Last Modified:about a minute ago Num Docs:158964 Max Doc:158964 Deleted Docs:0 Version:4479 Segment Count:1 {code} node 3 shard b {code} Last Modified:2 minutes ago Num Docs:158298 Max Doc:158298 Deleted Docs:0 Version:2886 {code} Size and versions are also different. Cluster is optimized/forceMerged but doesn't change the facts as expected. At least one other shard also has differences in its two replica's, i haven't manually checked the others. Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Priority: Critical Fix For: 5.0 After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more. Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs. We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: for those of you using gmail...
On Wed, Jul 17, 2013 at 10:02 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : And seems that it returns no result for query: : : from:jenk...@thetaphi.de subject:build 6605 ANY_WORD_NOT_IN_TITLE : : Maybe for some mails, only title field are taken into consideration? Ah ... interesting. I wonder if maybe something about the content of the jenkins emails + the multipart/mixed wrapping done by exmlm occasionally causes gmail to balk at trying to parse the various parts of *some* jenkins emails (like maybe just he ones with multi-byte characters?) so you are left with only the headers being searchable? That still doesn't explain this descrepancy though... Maybe it's because build 6605 was a biiig email (1.9 MB), and Google punted indexing any text whatsoever from its body? I see this in the build 6605 email: [Message clipped] View entire message : from:jenk...@thetaphi.de regression : : I only get results up to Jul 2, even though there are many build : failures after that. : : A recent search got results up to #6530. Still no 6605. mike says the newest email gmail will return from that serach is Jul 2, but Han, myself, (and if IIRC several other people) are all seeing lots of results since then ... just not all of them, notably the specific one Mike asked about (Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b96) - Build # 660 sent 13 hours ago) In fact, now when I run this search, I'm seeing additional results after Jul 2! And, they seem to be the smaller emails. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Maybe a custom search component would be in order, to “enrich” the incoming query. Again, preprocessing the query for synonym expansion before Solr parses it. It could call the external synonym API and cache synonyms as well. But, I’d still lean towards preprocessing in an application layer. Although, for hundreds or thousands of synonyms it would probably hit the 2048 common limit for URLs in some containers, which would need to be raised. -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai
[jira] [Created] (SOLR-5047) Color Shard/Collection Graph Nodes Based on Child Node Statuses
Thomas Murphy created SOLR-5047: --- Summary: Color Shard/Collection Graph Nodes Based on Child Node Statuses Key: SOLR-5047 URL: https://issues.apache.org/jira/browse/SOLR-5047 Project: Solr Issue Type: Improvement Components: web gui Reporter: Thomas Murphy Priority: Trivial In the Solr Admin UI, only the leaf (individual core) nodes have colored statuses, leaving collections and shards as no-context nodes. Having status information for collections and shards would improve the ability for an administrator to recognize which collections and shards are influenced by server downtime on certain cores. With increasing severity, the current core statuses are: Active, Recovering, Down, Recovery Failed, Gone The simplest plan: * shards inherit the best status of their cores, one functioning core of that shard implies that the shard is functional * collections inherit the worst status of their shards, one missing shard implies that the collection is not able to access data More complicated, but accurate, would be appropriate indication of partially failed shards and their influence on the total health of the collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712301#comment-13712301 ] Joel Bernstein commented on SOLR-4816: -- Markus, thanks for the info. Glad to hear it's working for you in production. Just wondering if you've turned on parallel updates and what batch size you're using? I'm thinking that large batch sizes with parallel updates would be very beneficial for performance. That way you would get long stretches of parallel indexing across the cluster. I suspect that UpdateRequestExt will eventually get folded into UpdateRequest based on the comments in the source. I'll ping Mark and see what he thinks about getting this committed. Add document routing to CloudSolrServer --- Key: SOLR-4816 URL: https://issues.apache.org/jira/browse/SOLR-4816 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch This issue adds the following enhancements to CloudSolrServer's update logic: 1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server. 2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster. These enhancements should allow for near linear scalability on indexing throughput. Usage: CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); cloudClient.setParallelUpdates(true); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(id, 0); doc1.addField(a_t, hello1); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, 2); doc2.addField(a_t, hello2); UpdateRequest request = new UpdateRequest(); request.add(doc1); request.add(doc2); request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response. //To get more detailed response down cast to RouteResponse: CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712300#comment-13712300 ] Mark Miller commented on SOLR-4260: --- See anything in the logs about zk expirations? Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Priority: Critical Fix For: 5.0 After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more. Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs. We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712314#comment-13712314 ] Markus Jelsma commented on SOLR-4260: - I've already restarted the job and enabled logging! It's going to take a while :) Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Priority: Critical Fix For: 5.0 After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more. Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs. We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712318#comment-13712318 ] Markus Jelsma commented on SOLR-4816: - Batch size is about 394 iirc, not very large indeed. I don't think i enabled parallel updates. Add document routing to CloudSolrServer --- Key: SOLR-4816 URL: https://issues.apache.org/jira/browse/SOLR-4816 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch This issue adds the following enhancements to CloudSolrServer's update logic: 1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server. 2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster. These enhancements should allow for near linear scalability on indexing throughput. Usage: CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); cloudClient.setParallelUpdates(true); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(id, 0); doc1.addField(a_t, hello1); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, 2); doc2.addField(a_t, hello2); UpdateRequest request = new UpdateRequest(); request.add(doc1); request.add(doc2); request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response. //To get more detailed response down cast to RouteResponse: CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712321#comment-13712321 ] Joel Bernstein commented on SOLR-4816: -- Thanks Mark. Add document routing to CloudSolrServer --- Key: SOLR-4816 URL: https://issues.apache.org/jira/browse/SOLR-4816 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.5 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch This issue adds the following enhancements to CloudSolrServer's update logic: 1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server. 2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster. These enhancements should allow for near linear scalability on indexing throughput. Usage: CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); cloudClient.setParallelUpdates(true); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(id, 0); doc1.addField(a_t, hello1); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, 2); doc2.addField(a_t, hello2); UpdateRequest request = new UpdateRequest(); request.add(doc1); request.add(doc2); request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response. //To get more detailed response down cast to RouteResponse: CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4816: -- Fix Version/s: (was: 4.4) 4.5 It's def too late for 4.4 (we already branched and the first rc vote is ongoing), but high priority for 4.5. Add document routing to CloudSolrServer --- Key: SOLR-4816 URL: https://issues.apache.org/jira/browse/SOLR-4816 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.5 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch This issue adds the following enhancements to CloudSolrServer's update logic: 1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server. 2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster. These enhancements should allow for near linear scalability on indexing throughput. Usage: CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); cloudClient.setParallelUpdates(true); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(id, 0); doc1.addField(a_t, hello1); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, 2); doc2.addField(a_t, hello2); UpdateRequest request = new UpdateRequest(); request.add(doc1); request.add(doc2); request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response. //To get more detailed response down cast to RouteResponse: CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release 4.4
Bah. You are right. We should respin for this. I'll update it. - Mark On Jul 18, 2013, at 1:06 AM, Jack Krupansky j...@basetechnology.com wrote: -1 In the Solr example solrconfig.xml: luceneMatchVersionLUCENE_43/luceneMatchVersion That should be: luceneMatchVersionLUCENE_44/luceneMatchVersion Otherwise the 4.4 changes to the EdgeNGramTokenizer/Filter are disabled in the Solr example config. See: https://issues.apache.org/jira/browse/LUCENE-3907 (Eliminated side=back). -- Jack Krupansky -Original Message- From: Steve Rowe Sent: Tuesday, July 16, 2013 2:32 AM To: dev@lucene.apache.org Subject: [VOTE] Release 4.4 Please vote to release Lucene and Solr 4.4, built off revision 1503555 of https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_4. RC0 artifacts are available at: http://people.apache.org/~sarowe/staging_area/lucene-solr-4.4.0-RC0-rev1503555 The smoke tester passes for me. Here's my +1. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712323#comment-13712323 ] Joel Bernstein commented on SOLR-4816: -- Markus, cloudClient.setParallelUpdates(true); Will turn on parallel updates, this in theory should give you much better performance. Depending on the size of docs you could probably go with a pretty high batch size. With ten servers a batch size of 5000 would send roughly 500 docs to each server. Add document routing to CloudSolrServer --- Key: SOLR-4816 URL: https://issues.apache.org/jira/browse/SOLR-4816 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.5 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch This issue adds the following enhancements to CloudSolrServer's update logic: 1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server. 2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster. These enhancements should allow for near linear scalability on indexing throughput. Usage: CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); cloudClient.setParallelUpdates(true); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(id, 0); doc1.addField(a_t, hello1); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, 2); doc2.addField(a_t, hello2); UpdateRequest request = new UpdateRequest(); request.add(doc1); request.add(doc2); request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response. //To get more detailed response down cast to RouteResponse: CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Actually, after chatting w/ Mike about it, he made a good point about making SynMap expose API like lookup(word), because that doesn't work with multi-word synonyms (e.g. wi fi - wifi). So I no longer think we should change SynFilter. Since in my case it's 1:1 (so much I learned so far), I should write my own TokenFilter. So now the question is whether to do it at indexing time or search time. Each has pros and cons. I'll need to learn more about the DB first, e.g. how many words have only tens of synonyms and how many thousands. I suspect there's no single solution here, so will need to experiment with both. Jack, I didn't quite follow the 2048 common limit -- is it a Solr limit of some sort? If so, can you please elaborate? Shai On Thu, Jul 18, 2013 at 4:12 PM, Jack Krupansky j...@basetechnology.comwrote: Maybe a custom search component would be in order, to “enrich” the incoming query. Again, preprocessing the query for synonym expansion before Solr parses it. It could call the external synonym API and cache synonyms as well. But, I’d still lean towards preprocessing in an application layer. Although, for hundreds or thousands of synonyms it would probably hit the 2048 common limit for URLs in some containers, which would need to be raised. -- Jack Krupansky *From:* Shai Erera ser...@gmail.com *Sent:* Thursday, July 18, 2013 8:54 AM *To:* dev@lucene.apache.org *Subject:* Re: Programmatic Synonyms Filter (Lucene and/or Solr) The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.comwrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky *From:* Shai Erera ser...@gmail.com *Sent:* Thursday, July 18, 2013 1:36 AM *To:* dev@lucene.apache.org *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai
Re: 4.0 and 4.1 FieldCacheImpl.DocTermsImpl.exists(docid) possibly broken
Thanks Doron, that's definitely completely backwards!! Good thing the API is gone. Mike McCandless http://blog.mikemccandless.com On Thu, Jul 18, 2013 at 7:50 AM, Doron Cohen cdor...@gmail.com wrote: Hi, just an FYI - may be helpful for anyone obliged to use 4.0.0 or 4.1.0 - it seems that this method is actually doing the opposite of its intention. I did not find mentions of this in the lists or elsewhere. This is the code for o.a.l.search.FieldCacheImpl.DocTermsImpl.exists(int): public boolean exists(int docID) { return docToOffset.get(docID) == 0; } Its description says: Returns true if this doc has this field and is not deleted. But it returns true for docs not containing the field and false for those that do contain it. A simple workaround is to not to call this method before calling getTerm() but rather just rely on getTerm() logic: ... returns the same BytesRef, or an empty (length=0) BytesRef if the doc did not have this field or was deleted. So usage code can be like this: DocTerms values = FieldCache.DEFAULT.getTerms(reader, FIELD_NAME); BytesRef term = new BytesRef(); for (int docid=0; docidvalues.size(); docid++) { term = values.getTerm(docid, term); if (term.length0) { doSomethingWith(term.utf8ToString()); } } FieldCache.DEFAULT.purge(reader); I am not sure about the overhead of this comparing to first checking exists(), but it at least work correctly. The code for exists() was as above until R1442497 (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?revision=1442497view=markup) and then in R1443717 (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?r1=1442497r2=1443717diff_format=h) the API was change as part of LUCENE-4547 (DocValues improvements) which was included in 4.2. Simple code to demonstrate this (here with 4.1 but same results with 4.0): RAMDirectory d = new RAMDirectory(); IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_41, new SimpleAnalyzer(Version.LUCENE_41))); w.addDocument(new Document()); // Empty doc (0, 0) Document doc = new Document(); // Real doc (1, 1) doc.add(new StringField(f1, v1, Store.NO)); w.addDocument(doc); w.addDocument(new Document()); // Empty doc (2, 2) w.addDocument(new Document()); // Empty doc (3, 3) w.commit(); // Commit - so we'll have two atomic readers doc = new Document(); // RealDoc (0, 4) doc.add(new StringField(f1, v2, Store.NO)); w.addDocument(doc); w.addDocument(new Document()); // Empty doc (1, 5) w.close(); IndexReader r = DirectoryReader.open(d); BytesRef br = new BytesRef(); for (AtomicReaderContext leaf : r.leaves()) { System.out.println(--- new atomic reader); AtomicReader reader = leaf.reader(); DocTerms a = FieldCache.DEFAULT.getTerms(reader, f1); for (int i = 0; i reader.maxDoc(); ++i) { int n = leaf.docBase + i; System.out.println(n+ exists: +a.exists(i)); br = a.getTerm(i, br); if (br.length 0) { System.out.println(n + +br.utf8ToString()); } } } The result printing: --- new atomic reader 0 exists: true 1 exists: false 1 v1 2 exists: true 3 exists: true --- new atomic reader 4 exists: false 4 v2 5 exists: true Indeed, exists() results are wrong. So again, just an FYI, as this API no longer exists... Regards, Doron - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Contentions observed in lucene execution
Lucene 2.4.x is quite ancient by now ... FSDirectory.FSIndexInput is single-threaded in seeking/reading bytes, which I think explains your 1 and 4. Try using MMapDirectory, if you are using a 64 bit JVM or if your index is tiny. Newer Lucene versions also have NIOFSDirectory, which is thread-friendly on Unix (but not on Windows due to a JVM bug). For 2 and 3, creating a FieldCache entry is also single threaded, but this is a one-time event on the first search to the IndexReader requiring that entry. Lucene 4.x adds doc values which are much more efficient to init at search time. But, what changed in your app? Perhaps there's less RAM available to the OS for caching IO pages (this could explain 1 and 4)? Mike McCandless http://blog.mikemccandless.com On Thu, Jul 18, 2013 at 6:46 AM, RameshIyerV rameshiy...@hotmail.com wrote: Hi All, I need some help in analyzing some contentions we observe in the Lucene execution. We are supporting the Sterling 9.0 fulfillment application and it uses Lucene 2.4 for catalog search functionality. ---The Issue This system is Live in production since Nov 2012 and only recently (Mid June 2013) our application is forming stuck threads during lucene invocations, this causes our application to crash. This occurs 2 - 3 times a week, on other days we see spikes of very slow performance on the exact places that causes stuck threads. ---The research--- We have validated that the data or the usage has not grown between Jan 2012 now. We took snapshot of the code execution (through visual VM) and for slow running treads we validated that too much time is spent at certain spots (these very same spots appear in the stack trace of the stuck threads). ---Help needed--- If you can guide me on what kind of contentions (heap, IO, Data, CPU, JVM params) can cause such a behavior it will really help. ---Lucene Invocation contentions observed--- (We find stuck threads slowness at the following spots, ordered in the order of severity [high to low]) 1. java.io.RandomAccessFile.readBytes(Native Method) java.io.RandomAccessFile.read(RandomAccessFile.java:338) org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596) org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) 2. org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:167) org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:373) org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71) org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351) 3. org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80) org.apache.lucene.index.TermBuffer.read(TermBuffer.java:65) org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127) org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:389) org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71) org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351) 4. java.io.RandomAccessFile.seek(Native Method) org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:591) org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) -- View this message in context: http://lucene.472066.n3.nabble.com/Contentions-observed-in-lucene-execution-tp4078796.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics. -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 9:46 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) Actually, after chatting w/ Mike about it, he made a good point about making SynMap expose API like lookup(word), because that doesn't work with multi-word synonyms (e.g. wi fi - wifi). So I no longer think we should change SynFilter. Since in my case it's 1:1 (so much I learned so far), I should write my own TokenFilter. So now the question is whether to do it at indexing time or search time. Each has pros and cons. I'll need to learn more about the DB first, e.g. how many words have only tens of synonyms and how many thousands. I suspect there's no single solution here, so will need to experiment with both. Jack, I didn't quite follow the 2048 common limit -- is it a Solr limit of some sort? If so, can you please elaborate? Shai On Thu, Jul 18, 2013 at 4:12 PM, Jack Krupansky j...@basetechnology.com wrote: Maybe a custom search component would be in order, to “enrich” the incoming query. Again, preprocessing the query for synonym expansion before Solr parses it. It could call the external synonym API and cache synonyms as well. But, I’d still lean towards preprocessing in an application layer. Although, for hundreds or thousands of synonyms it would probably hit the 2048 common limit for URLs in some containers, which would need to be raised. -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai
[jira] [Commented] (SOLR-4860) MoreLikeThisHandler doesn't work with numeric or date fields in 4.x
[ https://issues.apache.org/jira/browse/SOLR-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712365#comment-13712365 ] Yonik Seeley commented on SOLR-4860: It's unfortunately a lucene limitation - numeric type fields no longer work since they are encoded so differently (using a different attribute rather than a text attribute). I think we should probably just ignore numeric-type fields. MoreLikeThisHandler doesn't work with numeric or date fields in 4.x --- Key: SOLR-4860 URL: https://issues.apache.org/jira/browse/SOLR-4860 Project: Solr Issue Type: Bug Components: MoreLikeThis Affects Versions: 4.2 Reporter: Thomas Seidl After upgrading to Solr 4.2 (from 3.x), I realized that my MLT queries no longer work. It happens if I pass an integer ({{solr.TrieIntField}}), float ({{solr.TrieFloatField}}) or date ({{solr.DateField}}) field as part of the {{mlt.fl}} parameter. The field's {{multiValued}} setting doesn't seem to matter. This is the error I get: {noformat} NumericTokenStream does not support CharTermAttribute. java.lang.IllegalArgumentException: NumericTokenStream does not support CharTermAttribute. at org.apache.lucene.analysis.NumericTokenStream$NumericAttributeFactory.createAttributeInstance(NumericTokenStream.java:136) at org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271) at org.apache.lucene.queries.mlt.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:781) at org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:724) at org.apache.lucene.queries.mlt.MoreLikeThis.like(MoreLikeThis.java:578) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:348) at org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:167) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:679) {noformat} The configuration I use can be found here:
[jira] [Updated] (SOLR-4777) Handle SliceState in the Admin UI
[ https://issues.apache.org/jira/browse/SOLR-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-4777: Affects Version/s: 4.3 Fix Version/s: 4.5 Handle SliceState in the Admin UI - Key: SOLR-4777 URL: https://issues.apache.org/jira/browse/SOLR-4777 Project: Solr Issue Type: Improvement Components: SolrCloud, web gui Affects Versions: 4.3 Reporter: Anshum Gupta Fix For: 4.5 The Solr admin UI as of now does take Slice state into account. We need to have that differentiated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4777) Handle SliceState in the Admin UI
[ https://issues.apache.org/jira/browse/SOLR-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-4777: Description: The Solr admin UI as of now does take Slice state into account. We need to have that differentiated. There are three states: # The default is ACTIVE # CONSTRUCTION (used during shard splitting for new sub shards), and # INACTIVE - the parent shard is set to this state after split is complete A slice/shard which is INACTIVE will not accept traffic (i.e. it will re-route traffic to sub shards) even though the nodes inside this shard show up as green. We should show the INACTIVE shards in a different color to highlight this behavior. was: The Solr admin UI as of now does take Slice state into account. We need to have that differentiated. Handle SliceState in the Admin UI - Key: SOLR-4777 URL: https://issues.apache.org/jira/browse/SOLR-4777 Project: Solr Issue Type: Improvement Components: SolrCloud, web gui Affects Versions: 4.3 Reporter: Anshum Gupta Fix For: 4.5 The Solr admin UI as of now does take Slice state into account. We need to have that differentiated. There are three states: # The default is ACTIVE # CONSTRUCTION (used during shard splitting for new sub shards), and # INACTIVE - the parent shard is set to this state after split is complete A slice/shard which is INACTIVE will not accept traffic (i.e. it will re-route traffic to sub shards) even though the nodes inside this shard show up as green. We should show the INACTIVE shards in a different color to highlight this behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712377#comment-13712377 ] ASF subversion and git services commented on LUCENE-5030: - Commit 1504490 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1504490 ] LUCENE-5030: FuzzySuggester can optionally measure edits in Unicode code points instead of UTF8 bytes FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, run-suggest-benchmark.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712380#comment-13712380 ] ASF subversion and git services commented on LUCENE-5030: - Commit 1504492 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504492 ] LUCENE-5030: FuzzySuggester can optionally measure edits in Unicode code points instead of UTF8 bytes FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, run-suggest-benchmark.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-5030. Resolution: Fixed Fix Version/s: (was: 4.4) 4.5 OK I committed the last patch with a few small fixes: * Added @lucene.experimental to FuzzySuggester * Removed the added ctor (so we have just two ctors: the easy one, which uses all defaults, and the expert one, where you specify everything) * Removed System.out.printlns from the test Thanks Artem! FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Assignee: Michael McCandless Fix For: 5.0, 4.5 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, run-suggest-benchmark.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712400#comment-13712400 ] Markus Jelsma commented on SOLR-4260: - Alright, nothing looks like zookeeper expirations i grepped expirations in the error log but there's nothing there. This indexing session did not produce so many inconsistencies as the previous one; there is only 1 shard of which one replica has 2 more documents. It won't fix itself. During indexing there were, as usual, error such as autocommit causing a searcher too many and time outs talking to other nodes. Only 2 nodes report a Stopping Recovery For of which one node actually has a replica of the inconsistent core. The other shard is seems fine, both replica's have the same numDocs. Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Priority: Critical Fix For: 5.0 After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more. Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs. We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too
[ https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-5091: Assignee: David Smiley Modify SpanNotQuery to act as SpanNotNearQuery too -- Key: LUCENE-5091 URL: https://issues.apache.org/jira/browse/LUCENE-5091 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: 4.3.1 Reporter: Tim Allison Assignee: David Smiley Priority: Minor Fix For: 4.5 Attachments: LUCENE-5091.patch.txt With very small modifications, SpanNotQuery can act as a SpanNotNearQuery. To find a but not if b appears 3 tokens before or 4 tokens after a: new SpanNotQuery(a, b, 3, 4) Original constructor still exists and calls SpanNotQuery(a, b, 0, 0). Patch with tests on way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712408#comment-13712408 ] Uwe Schindler commented on LUCENE-5030: --- JUH! :-) Thanks for heavy committing - it took a long time, but now it is good! Many thanks, Uwe FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Assignee: Michael McCandless Fix For: 5.0, 4.5 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, run-suggest-benchmark.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/ibm-j9-jdk7) - Build # 6622 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6622/ Java: 64bit/ibm-j9-jdk7 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;} 1 tests failed. REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxRegistration Error Message: No SolrDynamicMBeans found Stack Trace: java.lang.AssertionError: No SolrDynamicMBeans found at __randomizedtesting.SeedInfo.seed([ABCC33B5C702AC02:251D578FAA43F467]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:613) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:780) Build Log: [...truncated 9411 lines...] [junit4] Suite:
Re: list-unsubscr...@apache.org
: Subject: list-unsubscr...@apache.org If anyone wihes to unsubscribe, you need to *send* an email to the unsubscribe address, not put it in the subject of a reply. the specifics of how to unsubscribe are listed in the footer of every email to the list(s) you are on... : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4542: - Assignee: Adrien Grand (was: Chris Male) Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Adrien Grand Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4734) FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
[ https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712427#comment-13712427 ] Ryan Lauck commented on LUCENE-4734: Thanks Adrien! I agree about LUCENE-2878. I came to the same conclusion before finding that someone had already done most of the work that the ideal scenario is to (optionally) pull postings or term vectors in addition to payloads while scoring and expose them for highlighting. I'm looking forward to that patch too! An idea I began working on but haven't polished enough to submit a patch for: Users of the API could access raw highlight metadata (offsets and positions) and could additionally process to merge/filter/ignore overlapping highlights - one flaw I've had to work around in existing highlighters is that when highlights overlap they either merge them or toss all but the first encountered. We perform the highlighting manually in our system and hope to one day allow end users to toggle which terms are highlighted without having to make round-trips to the server to modify the search criteria and rerun the highlighter. With raw offset data this is trivial and merging/discarding overlaps can be handled in client-side code. There are additional advantages too such as being able to highlight find-in-page or search-within-search results and only having to transfer new offset metadata rather than the entire text over the wire (we have some very big 100MB+ documents). FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight Key: LUCENE-4734 URL: https://issues.apache.org/jira/browse/LUCENE-4734 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0, 4.1, 5.0 Reporter: Ryan Lauck Labels: fastvectorhighlighter, highlighter Fix For: 4.4 Attachments: lucene-4734.patch, LUCENE-4734.patch If a proximity phrase query overlaps with any other query term it will not be highlighted. Example Text: A B C D E F G Example Queries: B E~10 D (D will be highlighted instead of B C D E) B E~10 C F~10 (nothing will be highlighted) This can be traced to the FieldPhraseList constructor's inner while loop. From the first example query, the first TermInfo popped off the stack will be B. The second TermInfo will be D which will not be found in the submap for B E~10 and will trigger a failed match. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712432#comment-13712432 ] Erick Erickson commented on SOLR-4478: -- OK, reconstructing an chat exchange: Sharing the underlying solrconfig objects looks like it's more difficult than I thought, with some interesting corner cases that would be difficult, i.e ${} substitutions, resource loader being shared, etc. Also, the individual core properties are embedded in the Config object, so keeping these separate is another source of getting code wrong. Not to mention that the code changes would be more extensive than anyone had hoped. At lest the use-case of opening a core and actively using it for a while then moving on is handled by the lazy/transient core opportunities. There is historical evidence that a significant amount of CPU resources are consumed by opening/closing cores 100s of times a second, so that scenario is still out there. The net-net is that it's probably not worth the effort right now to really share the underlying solrConfig object across cores, too many ways to go wrong. The refactoring that's been done should make this easier if we decide to do it in the future. Allow cores to specify a named config set - Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-460) hashCode improvements
[ https://issues.apache.org/jira/browse/LUCENE-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712434#comment-13712434 ] David Smiley commented on LUCENE-460: - I am not an expert on hashCode generation, yet as any java developer I have to generate hash codes. I typically leave this to my IDE, IntelliJ. As I find the need to update a hashCode, *do you think it's bad form for me to outright replace an existing hashCode implementation you wrote that looks complicated to me with what IntelliJ generates?*: Here's a specific example: SpanNotQuery formerly: {code:java} int h = include.hashCode(); h = (h1) | (h 31); // rotate left h ^= exclude.hashCode(); h = (h1) | (h 31); // rotate left h ^= Float.floatToRawIntBits(getBoost()); return h; {code} IntelliJ will generate a hashCode for this + a new pre post pair of integer fields I'm adding via LUCENE-5091: {code:java} int result = super.hashCode(); result = 31 * result + include.hashCode(); result = 31 * result + exclude.hashCode(); result = 31 * result + pre; result = 31 * result + post; return result; {code} Now that's a hashCode implementation I can understand, and I don't question it's validity because IntelliJ always generates them in a consistent fashion that I am used to seeing. Your hashCode might be better, but I simply don't understand and thus can't maintain it. Do you want me to consult you (or an applicable author of a confusing hashCode in general) every time? Granted this doesn't happen often. hashCode improvements - Key: LUCENE-460 URL: https://issues.apache.org/jira/browse/LUCENE-460 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Yonik Seeley Assignee: Yonik Seeley Priority: Minor It would be nice for all Query classes to implement hashCode and equals to enable them to be used as keys when caching. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712436#comment-13712436 ] Erick Erickson commented on SOLR-4478: -- I just had a bright idea, so I'll put it out there so someone can shoot it down. It seems like sharing the underlying solrConfig object is fraught with problems, but could we get an easy win by just sharing the parsed DOM object in each config set (really, same for schema object?) I don't have any measurements for what percentage of loading the schema object is spent in raw XML parsing, so I can't really say how much of a win this would be. But if it's easy/safe it might be worth considering. Allow cores to specify a named config set - Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too
[ https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712440#comment-13712440 ] David Smiley commented on LUCENE-5091: -- Looks good, Tim, except for one thing: The way you incorporated pre post into the hashCode is bad, as another unequal query with a pre and post values with flipped values would have the same hashCode. I'm consulting the other dev's on https://issues.apache.org/jira/browse/LUCENE-460?focusedCommentId=13712434page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13712434 on a suitable replacement, which will block me committing this for the moment. Also, I updated the package.html summary with a new description matching the class javadocs: {code:html} liA {@link org.apache.lucene.search.spans.SpanNotQuery SpanNotQuery} removes spans matching one {@link org.apache.lucene.search.spans.SpanQuery SpanQuery} which overlap (or comes near) another. This can be used, e.g., to implement within-paragraph search./li {code} Modify SpanNotQuery to act as SpanNotNearQuery too -- Key: LUCENE-5091 URL: https://issues.apache.org/jira/browse/LUCENE-5091 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: 4.3.1 Reporter: Tim Allison Assignee: David Smiley Priority: Minor Fix For: 4.5 Attachments: LUCENE-5091.patch.txt With very small modifications, SpanNotQuery can act as a SpanNotNearQuery. To find a but not if b appears 3 tokens before or 4 tokens after a: new SpanNotQuery(a, b, 3, 4) Original constructor still exists and calls SpanNotQuery(a, b, 0, 0). Patch with tests on way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
Hoss Man created SOLR-5048: -- Summary: fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5048: --- Attachment: SOLR-5048.patch all the necessary info is already in the build files ... we just need to tweak the format and check the files. fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712459#comment-13712459 ] Uwe Schindler commented on SOLR-5048: - I would change the example solr-config.xml file to use the {{x.y}} version format instead of {{LUCENE_xy}}. This would make the sanity check much simplier (you just have to check the configs and check for the {{$tests.luceneMatchVersion}} string). The XML parser of Solr supports the plain text format since the beginning of the {{Version}} class. fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY
Hoss Man created SOLR-5049: -- Summary: switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY Key: SOLR-5049 URL: https://issues.apache.org/jira/browse/SOLR-5049 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.4 Uwe just pointed out to me on IRC that you can specify {{luceneMatchVersion/}} using X.Y instead of the more internal java variable esque LUCENE_XY. i have no idea why we haven't been doing this in the past ... it makes so much more sense for end users, we should absolutely do this moving forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712464#comment-13712464 ] Hoss Man commented on SOLR-5048: opened a blocker issue SOLR-5049 to switch to this better format in the example congfigs -- it's a good idea in general in drastically simplifies the check we have to do for this issue. fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712470#comment-13712470 ] Uwe Schindler commented on SOLR-5048: - To conclude: The patch is fine, just remove the part inside common-build.xml and use the test property inside the solr validate check. You have to fix the solrconfig files in any case to use {{x.y}} format. fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
There are two serious issues with query-time synonyms, speed and correctness. 1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. This will not be fast. Expanding the term at index time means 1000 posting list entries, but only one term lookup at query time. 2. Query time expansion will give higher scores to the more rare synonyms. This is almost never what you want. If I make TV and television synonyms, I want them both to score the same. But if TV is 10X more common than television, then documents with the rare term (television) will score better. wunder On Jul 18, 2013, at 5:54 AM, Shai Erera wrote: The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai -- Walter Underwood wun...@wunderwood.org
[jira] [Commented] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY
[ https://issues.apache.org/jira/browse/SOLR-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712479#comment-13712479 ] Uwe Schindler commented on SOLR-5049: - I am sorry for not making this public back in 1.4 (or 3.0) times when I committed the lenient parser using the regex for the first time! :-) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY Key: SOLR-5049 URL: https://issues.apache.org/jira/browse/SOLR-5049 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.4 Uwe just pointed out to me on IRC that you can specify {{luceneMatchVersion/}} using X.Y instead of the more internal java variable esque LUCENE_XY. i have no idea why we haven't been doing this in the past ... it makes so much more sense for end users, we should absolutely do this moving forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712488#comment-13712488 ] Artem Lukanin commented on LUCENE-5030: --- Great! Thanks for reviewing. FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Assignee: Michael McCandless Fix For: 5.0, 4.5 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, run-suggest-benchmark.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712496#comment-13712496 ] ASF subversion and git services commented on LUCENE-4542: - Commit 1504529 from [~jpountz] in branch 'dev/trunk' [ https://svn.apache.org/r1504529 ] LUCENE-4542: Make HunspellStemFilter's maximum recursion level configurable. Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Adrien Grand Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
We index time synonyms means you bloat the index with a lot of new postings, most of them are just duplicates of each other. And in my case, cause for every synonym there's a weight, I cannot even consider postings deduplication... There's a tradeoff here (as usual). Both approaches have pros and cons. I think index time is better in the end because a larger index can be solved by throwing more hardware at it. But queries with thousands of terms are a real issue. One thing I can look at is if the synonym sets can be 'grouped' in a way that instead of all the terms I index a group ID or something and then during search i resolve a term to all the groups it may belong to... I'll need to think about it more. On Jul 18, 2013 7:49 PM, Walter Underwood wun...@wunderwood.org wrote: There are two serious issues with query-time synonyms, speed and correctness. 1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. This will not be fast. Expanding the term at index time means 1000 posting list entries, but only one term lookup at query time. 2. Query time expansion will give higher scores to the more rare synonyms. This is almost never what you want. If I make TV and television synonyms, I want them both to score the same. But if TV is 10X more common than television, then documents with the rare term (television) will score better. wunder On Jul 18, 2013, at 5:54 AM, Shai Erera wrote: The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.comwrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky *From:* Shai Erera ser...@gmail.com *Sent:* Thursday, July 18, 2013 1:36 AM *To:* dev@lucene.apache.org *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai -- Walter Underwood wun...@wunderwood.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712503#comment-13712503 ] ASF subversion and git services commented on LUCENE-4542: - Commit 1504531 from [~jpountz] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504531 ] LUCENE-4542: Make HunspellStemFilter's maximum recursion level configurable. Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Adrien Grand Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-4542. -- Resolution: Fixed Fix Version/s: 4.5 Committed, thanks! Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Adrien Grand Fix For: 4.5 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Adding terms to posting lists is about the most space-efficient thing you can do in a search engine, so I would not worry too much about that. wunder On Jul 18, 2013, at 10:06 AM, Shai Erera wrote: We index time synonyms means you bloat the index with a lot of new postings, most of them are just duplicates of each other. And in my case, cause for every synonym there's a weight, I cannot even consider postings deduplication... There's a tradeoff here (as usual). Both approaches have pros and cons. I think index time is better in the end because a larger index can be solved by throwing more hardware at it. But queries with thousands of terms are a real issue. One thing I can look at is if the synonym sets can be 'grouped' in a way that instead of all the terms I index a group ID or something and then during search i resolve a term to all the groups it may belong to... I'll need to think about it more. On Jul 18, 2013 7:49 PM, Walter Underwood wun...@wunderwood.org wrote: There are two serious issues with query-time synonyms, speed and correctness. 1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. This will not be fast. Expanding the term at index time means 1000 posting list entries, but only one term lookup at query time. 2. Query time expansion will give higher scores to the more rare synonyms. This is almost never what you want. If I make TV and television synonyms, I want them both to score the same. But if TV is 10X more common than television, then documents with the rare term (television) will score better. wunder On Jul 18, 2013, at 5:54 AM, Shai Erera wrote: The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Summary: Pluggable Analytics (was: Aggregating Collectors and AggregatorComponent) Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Aggregating Collectors and AggregatorComponent
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Attachment: SOLR-5045.patch Added SumQParserPlugin which is a very simple *Aggregator* implementation. At this point this class is only to show the mechanics of how an Aggregator works and to test the framework. This class will be iterated on to make it more full featured. Aggregating Collectors and AggregatorComponent -- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY
[ https://issues.apache.org/jira/browse/SOLR-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712511#comment-13712511 ] ASF subversion and git services commented on SOLR-5049: --- Commit 1504532 from hoss...@apache.org in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1504532 ] SOLR-5049: use '4.4' as luceneMatchVersion in all example solrconfig.xml files switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY Key: SOLR-5049 URL: https://issues.apache.org/jira/browse/SOLR-5049 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.4 Uwe just pointed out to me on IRC that you can specify {{luceneMatchVersion/}} using X.Y instead of the more internal java variable esque LUCENE_XY. i have no idea why we haven't been doing this in the past ... it makes so much more sense for end users, we should absolutely do this moving forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Programmatic Synonyms Filter (Lucene and/or Solr)
I think Jack is implicitely referring to Solr. In the case of a pure Lucene application without Solr or a custom query parser plugged into Solr that does the query-time expansion, the limit is not the URL length (which only applies to Solr as the query is part of the URL), but more the fact that Lucene refuses to run with more than 1024 BQ clauses J Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, July 18, 2013 4:05 PM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics. -- Jack Krupansky From: Shai Erera mailto:ser...@gmail.com Sent: Thursday, July 18, 2013 9:46 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) Actually, after chatting w/ Mike about it, he made a good point about making SynMap expose API like lookup(word), because that doesn't work with multi-word synonyms (e.g. wi fi - wifi). So I no longer think we should change SynFilter. Since in my case it's 1:1 (so much I learned so far), I should write my own TokenFilter. So now the question is whether to do it at indexing time or search time. Each has pros and cons. I'll need to learn more about the DB first, e.g. how many words have only tens of synonyms and how many thousands. I suspect there's no single solution here, so will need to experiment with both. Jack, I didn't quite follow the 2048 common limit -- is it a Solr limit of some sort? If so, can you please elaborate? Shai On Thu, Jul 18, 2013 at 4:12 PM, Jack Krupansky j...@basetechnology.com wrote: Maybe a custom search component would be in order, to “enrich” the incoming query. Again, preprocessing the query for synonym expansion before Solr parses it. It could call the external synonym API and cache synonyms as well. But, I’d still lean towards preprocessing in an application layer. Although, for hundreds or thousands of synonyms it would probably hit the 2048 common limit for URLs in some containers, which would need to be raised. -- Jack Krupansky From: Shai Erera mailto:ser...@gmail.com Sent: Thursday, July 18, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera mailto:ser...@gmail.com Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai
[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712517#comment-13712517 ] Jack Krupansky commented on SOLR-5048: -- On the ReleaseToDo wiki I see this step: Find/replace LUCENE_XY - LUCENE_X(Y+1) across all of Lucene and Solr, but do NOT change usages under the lucene/analysis/ that allow version-specific behavior, but when I look at the example solrconfig.xml in branch_4x, it still says LUCENE_43, suggesting that this step has been skipped twice now. I think it should say 4.5, since LUCENE_45 is the only un-deprecated version for branch_4x now, right? Maybe the step After creating a new release branch, update the version in the base release branch (e.g. branches/branch_4x) needed to be reviewed (or merely followed) a little more closely. See: http://wiki.apache.org/lucene-java/ReleaseTodo#Branching_.26_Feature_Freeze fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
Robert Muir created LUCENE-5119: --- Summary: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory Key: LUCENE-5119 URL: https://issues.apache.org/jira/browse/LUCENE-5119 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5119.patch These are accessed sequentially when e.g. faceting, and can be a fairly large amount of data (based on # of docs and # of unique terms). I think this was done so that conceptually random access to a specific docid would be faster than eg. stored fields, but I think we should instead target the DV datastructures towards real use cases (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5119: Attachment: LUCENE-5119.patch DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory -- Key: LUCENE-5119 URL: https://issues.apache.org/jira/browse/LUCENE-5119 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5119.patch These are accessed sequentially when e.g. faceting, and can be a fairly large amount of data (based on # of docs and # of unique terms). I think this was done so that conceptually random access to a specific docid would be faster than eg. stored fields, but I think we should instead target the DV datastructures towards real use cases (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712526#comment-13712526 ] Steve Rowe commented on SOLR-5048: -- bq. Maybe the step After creating a new release branch, update the version in the base release branch (e.g. branches/branch_4x) needed to be reviewed (or merely followed) a little more closely. I agree - my responsibility this time around, and can see that I missed these in all the solrconfig.xml's ... mea culpa. Nice catch, Jack. fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712527#comment-13712527 ] Uwe Schindler commented on SOLR-5048: - We should also change the release todo to now use {{x.y}} as format in solrconfig.xml, see SOLR-5049. bq. I think it should say 4.5, since LUCENE_45 is the only un-deprecated version for branch_4x now, right? Yes. fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling to sum the field popularity. aggregate=true - turns on the AggregatorComponent was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq={!sum field=popularity id=mysum}aggregate=true fq={!sum field=popularity id=mysum} - Calls the SumQParserPlugin telling to sum the field popularity. aggregate=true - turns on the AggregatorComponent Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling to sum the field popularity. aggregate=true - turns on the AggregatorComponent -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq={!sum field=popularity id=mysum}aggregate=true fq={!sum field=popularity id=mysum} - Calls the SumQParserPlugin telling to sum the field popularity. aggregate=true - turns on the AggregatorComponent was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq={!sum field=popularity id=mysum}aggregate=true fq={!sum field=popularity id=mysum} - Calls the SumQParserPlugin telling to sum the field popularity. aggregate=true - turns on the AggregatorComponent -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*:*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*:*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-4542: --- Fix Version/s: (was: 4.5) 4.4 5.0 Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Steve Rowe Fix For: 5.0, 4.4 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe reopened LUCENE-4542: Assignee: Steve Rowe (was: Adrien Grand) Reopening to backport to 4.4, based on conversation with Adrien on #lucene-dev IRC. Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Steve Rowe Fix For: 4.5 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling to sum the field popularity. aggregate=true - turns on the AggregatorComponent Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=*:*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it to sum the field popularity. aggregate=true - turns on the AggregatorComponent -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Agreed, to some extent. I think mostly it is a matter of how frequently the synonyms may be updated. OTOH, it is always technically possible to analyze synonym updates, perform queries on both the old and new synonyms, and then update the index for the documents containing synonym changes. How much do you know about the frequency of synonym updates for this synonym source API? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:06 PM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) We index time synonyms means you bloat the index with a lot of new postings, most of them are just duplicates of each other. And in my case, cause for every synonym there's a weight, I cannot even consider postings deduplication... There's a tradeoff here (as usual). Both approaches have pros and cons. I think index time is better in the end because a larger index can be solved by throwing more hardware at it. But queries with thousands of terms are a real issue. One thing I can look at is if the synonym sets can be 'grouped' in a way that instead of all the terms I index a group ID or something and then during search i resolve a term to all the groups it may belong to... I'll need to think about it more. On Jul 18, 2013 7:49 PM, Walter Underwood wun...@wunderwood.org wrote: There are two serious issues with query-time synonyms, speed and correctness. 1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. This will not be fast. Expanding the term at index time means 1000 posting list entries, but only one term lookup at query time. 2. Query time expansion will give higher scores to the more rare synonyms. This is almost never what you want. If I make TV and television synonyms, I want them both to score the same. But if TV is 10X more common than television, then documents with the rare term (television) will score better. wunder On Jul 18, 2013, at 5:54 AM, Shai Erera wrote: The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote: Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word phrase synonyms as well, or is this strictly single-word synonyms? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:36 AM To: dev@lucene.apache.org Subject: Programmatic Synonyms Filter (Lucene and/or Solr) Hi I was asked to integrate with a system which provides synonyms for words through API. I checked the existing synonym filters in Lucene and Solr and they all seem to take a synonyms map up front. E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's not really programmatic in the sense that I can provide an impl which will pull the synonyms through the other system's API. Solr SynonymFilterFactory just loads the synonyms from a file into a SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can extend that one either. The problem is that the synonyms DB I should integrate with is HUGE and will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all available synonyms from it in one go. The API I have is something like String[] getSynonyms(String word). So I have few questions: 1) Did I miss a Filter which does take a programmatic syn-map which I can provide my own impl to? 2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl so that users can provide their own impl, e.g. not requiring everything to be in RAM? 2.1) Side-effect benefit, I think, is that we won't require everyone to deal with the FST API that way, though I'll admit I cannot think of may use cases for not using SynonymFilter as-is ... 3) If the answer to (1) and (2) is NO, I guess my only option is to implement my own SynonymFilter, copying most of the code from Lucene's ... right? Shai -- Walter Underwood wun...@wunderwood.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712539#comment-13712539 ] ASF subversion and git services commented on LUCENE-4542: - Commit 1504561 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1504561 ] LUCENE-4542: move CHANGES.txt entry to the 4.4 section Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Steve Rowe Fix For: 5.0, 4.4 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The out will contain a block that looks like this: %3Clst%20name%3D%22aggregates%22%3E%0A%20%20%3Clst%20name%3D%22mysum%22%3E%0A%20%20%20%20%3Clong%20name%3D%22sum%22%3E85%3C%2Flong%3E%0A%20%20%3C%2Flst%3E%0A%3C%2Flst%3E Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712541#comment-13712541 ] ASF subversion and git services commented on LUCENE-4542: - Commit 1504563 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504563 ] LUCENE-4542: move CHANGES.txt entry to the 4.4 section (merged trunk r1504561) Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Steve Rowe Fix For: 5.0, 4.4 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5050) forbidden-apis errors
Hoss Man created SOLR-5050: -- Summary: forbidden-apis errors Key: SOLR-5050 URL: https://issues.apache.org/jira/browse/SOLR-5050 Project: Solr Issue Type: Bug Reporter: Hoss Man I'm not sure if i'm the only one seeing this, or if it's a relatively newly introduced error, but on trunk... {noformat} [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden method invocation: java.util.Properties#load(java.io.InputStream) [Properties files should be read/written with Reader/Writer, using UTF-8 charset. This allows reading older files with unicode escapes, too.] [forbidden-apis] in org.apache.solr.core.SolrCoreDiscoverer (SolrCoreDiscoverer.java:75) [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please fix the classpath! [forbidden-apis] Scanned 2442 (and 1361 related) class file(s) for forbidden API invocations (in 1.91s), 1 error(s). BUILD FAILED /home/hossman/lucene/dev/build.xml:67: The following error occurred while executing this line: /home/hossman/lucene/dev/solr/build.xml:263: Check for forbidden API calls failed, see log. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The out will contain a block that looks like this: %3Clst%20name%3D%22aggregates%22%3E%0A%20%20%3Clst%20name%3D%22mysum%22%3E%0A%20%20%20%20%3Clong%20name%3D%22sum%22%3E85%3C%2Flong%3E%0A%20%20%3C%2Flst%3E%0A%3C%2Flst%3E was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The out will contain a block that looks like this: %3Clst%20name%3D%22aggregates%22%3E%0A%20%20%3Clst%20name%3D%22mysum%22%3E%0A%20%20%20%20%3Clong%20name%3D%22sum%22%3E85%3C%2Flong%3E%0A%20%20%3C%2Flst%3E%0A%3C%2Flst%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands,
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712551#comment-13712551 ] David Smiley commented on LUCENE-5119: -- Would it be easy to add random access as an option? Looking at your patch, which was pretty simple, it doesn't appear that it'd be hard to support random access should an application which to want this. A realistic example in my mind is a spatial filter in which a potentially large binary geometry representations of a shape is encoded for each document into DiskDV. Some fast leading filters narrow down the applicable documents but some documents shape geometry need to be consulted in the DiskDV afterwards. Does that make sense? DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory -- Key: LUCENE-5119 URL: https://issues.apache.org/jira/browse/LUCENE-5119 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5119.patch These are accessed sequentially when e.g. faceting, and can be a fairly large amount of data (based on # of docs and # of unique terms). I think this was done so that conceptually random access to a specific docid would be faster than eg. stored fields, but I think we should instead target the DV datastructures towards real use cases (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712548#comment-13712548 ] Jack Krupansky commented on SOLR-5045: -- One interesting test case: There has been some interest in adding median to the stats component. The difficulty is that you need to build up the frequency distribution so that you can find the value that is = half of the values, which is a lot more effort than simply adding values to an accumulator. Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; {lt;lst name=quot;mysumquot;gt;} lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; {lt;lst name=quot;mysumquot;gt; {lt;long name=quot;sumquot;gt;85lt;/longgt;} lt;/lstgt;} lt;/lst was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; {lt;lst name=quot;mysumquot;gt; {lt;long name=quot;sumquot;gt;85lt;/longgt;} lt;/lstgt;} lt;/lst -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY
[ https://issues.apache.org/jira/browse/SOLR-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712549#comment-13712549 ] ASF subversion and git services commented on SOLR-5049: --- Commit 1504566 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1504566 ] SOLR-5049: use '5.0' as luceneMatchVersion in all example solrconfig.xml files switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY Key: SOLR-5049 URL: https://issues.apache.org/jira/browse/SOLR-5049 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.4 Uwe just pointed out to me on IRC that you can specify {{luceneMatchVersion/}} using X.Y instead of the more internal java variable esque LUCENE_XY. i have no idea why we haven't been doing this in the past ... it makes so much more sense for end users, we should absolutely do this moving forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; {lt;lst name=quot;mysumquot;gt;} lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; {lt;lst name=quot;mysumquot;gt; {lt;long name=quot;sumquot;gt;85lt;/longgt;} lt;/lstgt;} lt;/lst Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; {lt;lst name=quot;mysumquot;gt;} lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; {lt;lst name=quot;mysumquot;gt;} lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst {code} was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712554#comment-13712554 ] Jack Krupansky commented on SOLR-5045: -- Can I script some custom analytics? Or is that simply a question of how this new component hooks in with the proposed JavaScriptRequestHandler (SOLR-5005)? Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712558#comment-13712558 ] Robert Muir commented on LUCENE-5119: - I dont plan to do this. Thats why we have a codec api... DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory -- Key: LUCENE-5119 URL: https://issues.apache.org/jira/browse/LUCENE-5119 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5119.patch These are accessed sequentially when e.g. faceting, and can be a fairly large amount of data (based on # of docs and # of unique terms). I think this was done so that conceptually random access to a specific docid would be faster than eg. stored fields, but I think we should instead target the DV datastructures towards real use cases (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5045: - Description: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lst name=aggregates lst name=mysum long name=sum85/long /lst /lst {code} was: This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lt;lst name=quot;aggregatesquot;gt; lt;lst name=quot;mysumquot;gt; lt;long name=quot;sumquot;gt;85lt;/longgt; lt;/lstgt; lt;/lst {code} Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lst name=aggregates lst name=mysum long name=sum85/long /lst /lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail:
[jira] [Resolved] (SOLR-5050) forbidden-apis errors
[ https://issues.apache.org/jira/browse/SOLR-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-5050. Resolution: Not A Problem my mistake ... apparently i wasn't as clean as i thought forbidden-apis errors - Key: SOLR-5050 URL: https://issues.apache.org/jira/browse/SOLR-5050 Project: Solr Issue Type: Bug Reporter: Hoss Man I'm not sure if i'm the only one seeing this, or if it's a relatively newly introduced error, but on trunk... {noformat} [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden method invocation: java.util.Properties#load(java.io.InputStream) [Properties files should be read/written with Reader/Writer, using UTF-8 charset. This allows reading older files with unicode escapes, too.] [forbidden-apis] in org.apache.solr.core.SolrCoreDiscoverer (SolrCoreDiscoverer.java:75) [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please fix the classpath! [forbidden-apis] Scanned 2442 (and 1361 related) class file(s) for forbidden API invocations (in 1.91s), 1 error(s). BUILD FAILED /home/hossman/lucene/dev/build.xml:67: The following error occurred while executing this line: /home/hossman/lucene/dev/solr/build.xml:263: Check for forbidden API calls failed, see log. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5050) forbidden-apis errors
[ https://issues.apache.org/jira/browse/SOLR-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed SOLR-5050. --- forbidden-apis errors - Key: SOLR-5050 URL: https://issues.apache.org/jira/browse/SOLR-5050 Project: Solr Issue Type: Bug Reporter: Hoss Man I'm not sure if i'm the only one seeing this, or if it's a relatively newly introduced error, but on trunk... {noformat} [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden method invocation: java.util.Properties#load(java.io.InputStream) [Properties files should be read/written with Reader/Writer, using UTF-8 charset. This allows reading older files with unicode escapes, too.] [forbidden-apis] in org.apache.solr.core.SolrCoreDiscoverer (SolrCoreDiscoverer.java:75) [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix the classpath! [forbidden-apis] WARNING: The referenced class 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please fix the classpath! [forbidden-apis] Scanned 2442 (and 1361 related) class file(s) for forbidden API invocations (in 1.91s), 1 error(s). BUILD FAILED /home/hossman/lucene/dev/build.xml:67: The following error occurred while executing this line: /home/hossman/lucene/dev/solr/build.xml:263: Check for forbidden API calls failed, see log. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5048: --- Attachment: SOLR-5048.patch much simpler patch that depends on the changes in SOLR-5049 fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch, SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712566#comment-13712566 ] Joel Bernstein commented on SOLR-5045: -- You have the flexibility to calculate median, atleast on a single server. Not sure what the best approach to this would be. Distributed median may be harder. You'd have to build up distributions in a way that can be merged. Scripting is a very cool thing. I need to do some research though on SOLR-5005 and see if can be applied. Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lst name=aggregates lst name=mysum long name=sum85/long /lst /lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion
[ https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712567#comment-13712567 ] ASF subversion and git services commented on SOLR-5048: --- Commit 1504570 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1504570 ] SOLR-5048: fail the build if the example solrconfig.xml files do not have an up to date luceneMatchVersion fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion -- Key: SOLR-5048 URL: https://issues.apache.org/jira/browse/SOLR-5048 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5048.patch, SOLR-5048.patch 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... the build should fail in a situation like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712568#comment-13712568 ] ASF subversion and git services commented on LUCENE-4542: - Commit 1504571 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1504571 ] LUCENE-4542: Make HunspellStemFilter's maximum recursion level configurable move CHANGES.txt entry to the 4.4 section (merged trunk r1504529 r1504561) Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Steve Rowe Fix For: 5.0, 4.4 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712566#comment-13712566 ] Joel Bernstein edited comment on SOLR-5045 at 7/18/13 5:59 PM: --- You have the flexibility to calculate median, atleast on a single server. Not sure what the best approach to this would be. Distributed median may be harder. You'd have to build up distributions in a way that can be merged. Scripting is a very cool thing. I need to do some research though on SOLR-5005 and see if it can be applied. was (Author: joel.bernstein): You have the flexibility to calculate median, atleast on a single server. Not sure what the best approach to this would be. Distributed median may be harder. You'd have to build up distributions in a way that can be merged. Scripting is a very cool thing. I need to do some research though on SOLR-5005 and see if can be applied. Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lst name=aggregates lst name=mysum long name=sum85/long /lst /lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org