[jira] [Commented] (SOLR-10393) Add UUID Stream Evaluator
[ https://issues.apache.org/jira/browse/SOLR-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952512#comment-15952512 ] ASF subversion and git services commented on SOLR-10393: Commit 7e8272c89ec42519894b64c0ac576a1a2889bd32 in lucene-solr's branch refs/heads/branch_6x from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e8272c ] SOLR-10393: Adds UUID Streaming Evaluator > Add UUID Stream Evaluator > - > > Key: SOLR-10393 > URL: https://issues.apache.org/jira/browse/SOLR-10393 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Dennis Gove > Attachments: SOLR-10393.patch, SOLR-10393.patch > > > The cartesianProduct function emits multiple tuples from a single tuple. To > save the cartesian product in another collection it would be useful to be > able to dynamically assign new unique id's to tuples. The uuid() stream > evaluator will allow us to do this. > sample syntax: > {code} > cartesianProduct(expr, fielda, uuid() as id) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10356) Add Streaming Evaluators for basic math functions
[ https://issues.apache.org/jira/browse/SOLR-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952511#comment-15952511 ] ASF subversion and git services commented on SOLR-10356: Commit 6ce02bc693d4ef67872e9c536155c5308227d6e9 in lucene-solr's branch refs/heads/branch_6x from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6ce02bc ] SOLR-10356: Adds basic math streaming evaluators > Add Streaming Evaluators for basic math functions > - > > Key: SOLR-10356 > URL: https://issues.apache.org/jira/browse/SOLR-10356 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Attachments: SOLR-10356.patch, SOLR-10356.patch, SOLR-10356.patch, > SOLR-10356.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10329) Rebuild Solr examples
[ https://issues.apache.org/jira/browse/SOLR-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952513#comment-15952513 ] Avtar Singh commented on SOLR-10329: hello sir, My name is Avtar Singh, I have previously developed a fact based question answering system based on Apache Solr and Apache Lucene. I would love to work on the project, I believe that I can do the project very efficiently. Thank you > Rebuild Solr examples > - > > Key: SOLR-10329 > URL: https://issues.apache.org/jira/browse/SOLR-10329 > Project: Solr > Issue Type: Wish > Components: examples >Reporter: Alexandre Rafalovitch > Labels: gsoc2017 > > Apache Solr ships with a number of examples. They evolved from a kitchen sync > example and are rather large. When new Solr features are added, they are > often shoehorned into the most appropriate example and sometimes are not > represented at all. > Often, for new users, it is hard to tell what part of example is relevant, > what part is default and what part is demonstrating something completely > different. > It would take significant (and very appreciated) effort to review all the > examples and rebuild them to provide clean way to showcase best practices > around base and most recent features. > Specific issues are around kitchen sync vs. minimal examples, better approach > to "schemaless" mode and creating examples and datasets that allow to create > both "hello world" and more-advanced tutorials. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10393) Add UUID Stream Evaluator
[ https://issues.apache.org/jira/browse/SOLR-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-10393: --- Attachment: SOLR-10393.patch > Add UUID Stream Evaluator > - > > Key: SOLR-10393 > URL: https://issues.apache.org/jira/browse/SOLR-10393 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Dennis Gove > Attachments: SOLR-10393.patch, SOLR-10393.patch > > > The cartesianProduct function emits multiple tuples from a single tuple. To > save the cartesian product in another collection it would be useful to be > able to dynamically assign new unique id's to tuples. The uuid() stream > evaluator will allow us to do this. > sample syntax: > {code} > cartesianProduct(expr, fielda, uuid() as id) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10393) Add UUID Stream Evaluator
[ https://issues.apache.org/jira/browse/SOLR-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952503#comment-15952503 ] ASF subversion and git services commented on SOLR-10393: Commit ef821834d15194c2c8b626d494b5119dd42b4f9f in lucene-solr's branch refs/heads/master from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ef82183 ] SOLR-10393: Adds UUID Streaming Evaluator > Add UUID Stream Evaluator > - > > Key: SOLR-10393 > URL: https://issues.apache.org/jira/browse/SOLR-10393 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Dennis Gove > Attachments: SOLR-10393.patch > > > The cartesianProduct function emits multiple tuples from a single tuple. To > save the cartesian product in another collection it would be useful to be > able to dynamically assign new unique id's to tuples. The uuid() stream > evaluator will allow us to do this. > sample syntax: > {code} > cartesianProduct(expr, fielda, uuid() as id) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10356) Add Streaming Evaluators for basic math functions
[ https://issues.apache.org/jira/browse/SOLR-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952502#comment-15952502 ] ASF subversion and git services commented on SOLR-10356: Commit 674ce4e89393efe3147629e76f053c9901c182dc in lucene-solr's branch refs/heads/master from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=674ce4e ] SOLR-10356: Adds basic math streaming evaluators > Add Streaming Evaluators for basic math functions > - > > Key: SOLR-10356 > URL: https://issues.apache.org/jira/browse/SOLR-10356 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Attachments: SOLR-10356.patch, SOLR-10356.patch, SOLR-10356.patch, > SOLR-10356.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7729) Support for string type separator for CustomSeparatorBreakIterator
[ https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952490#comment-15952490 ] Amrit Sarkar commented on LUCENE-7729: -- :) I looked into the SimplePatternTokenizer and how it does the pattern matching utilising finite-state deterministic automata. CharacterRunAutomaton is the one fundamental for the hypothetical PatternBreakIterator. It should not be much work considering everything has been implemented very extensively and SimplePatternTokenizer provides a perfect example. I will try to devise something out of it and update soon. > Support for string type separator for CustomSeparatorBreakIterator > -- > > Key: LUCENE-7729 > URL: https://issues.apache.org/jira/browse/LUCENE-7729 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Reporter: Amrit Sarkar > Attachments: LUCENE-7729.patch, LUCENE-7729.patch > > > LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the > _char_ passed is found. > Improved CustomSeparatorBreakIterator; as it now supports separator of string > type of arbitrary length. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7729) Support for string type separator for CustomSeparatorBreakIterator
[ https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937734#comment-15937734 ] Amrit Sarkar edited comment on LUCENE-7729 at 4/1/17 11:56 PM: --- bq. len > 0 (as a comment) but in all cases you probably mean len > 1? Yes, that is correct. bq. Let me give a better example of length 3: aab would fail to match aaab. I just wrote a test for that to confirm it failed. Here's another example of length 4 that may be more clear: A separator of acab would fail to be detected in acacab. I see. The implemented is flawed, the algorithm I thought is incomplete and though some minor tweaking will make it work surely. I never considered repetitive pattern in the separator. bq. To be clear, I never asked or recommended. David, I completely understand and aware, I just pointed out the conversation which motivates me to look into it. I am thankful to you for taking your time out to provide healthy insights and feedback on the patch. I will not get discouraged if some of my work doesn't get into the main project, even I want to contribute which is useful not flawed. With that, I will check out SimplePatternTokenizer and the Automaton part. Thank you for your time again, really appreciate that. Should I leave this JIRA as it is? or instead atleast fix the implementation? was (Author: sarkaramr...@gmail.com): bq. len > 0 (as a comment) but in all cases you probably mean len > 1? Yes, that is correct. bq. Let me give a better example of length 3: aab would fail to match aaab. I just wrote a test for that to confirm it failed. Here's another example of length 4 that may be more clear: A separator of acab would fail to be detected in acacab. I see. The implemented is flawed, the algorithm I thought is incomplete and though some minor tweaking will make it work surely. I never considered repetitive pattern in the separator. bq. To be clear, I never asked or recommended. David, I completely understand and aware, I just pointed out the conversation which motivates me to look into it. I am thankful to you for taking your time out to provide healthy insights and feedback on the patch. I will not get discouraged if some of my work doesn't get into the main project, even I want to contribute which is useful not flawed. With that, I will check out SimplePatternTokenizer and the Automation part. Thank you for your time again, really appreciate that. Should I leave this JIRA as it is? or instead atleast fix the implementation? > Support for string type separator for CustomSeparatorBreakIterator > -- > > Key: LUCENE-7729 > URL: https://issues.apache.org/jira/browse/LUCENE-7729 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Reporter: Amrit Sarkar > Attachments: LUCENE-7729.patch, LUCENE-7729.patch > > > LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the > _char_ passed is found. > Improved CustomSeparatorBreakIterator; as it now supports separator of string > type of arbitrary length. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-9601) DIH: Radicially simplify Tika example to only show relevant configuration
[ https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandre Rafalovitch resolved SOLR-9601. - Resolution: Fixed Fix Version/s: 6.6 master (7.0) > DIH: Radicially simplify Tika example to only show relevant configuration > - > > Key: SOLR-9601 > URL: https://issues.apache.org/jira/browse/SOLR-9601 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) >Affects Versions: 6.x, master (7.0) >Reporter: Alexandre Rafalovitch >Assignee: Alexandre Rafalovitch > Labels: examples, usability > Fix For: master (7.0), 6.6 > > Attachments: tika2_20170308.tgz, tika2_20170316.tgz > > > Solr DIH examples are legacy examples to show how DIH work. However, they > include full configurations that may obscure teaching points. This is no > longer needed as we have 3 full-blown examples in the configsets. > Specifically for Tika, the field types definitions were at some point > simplified to have less support files in the configuration directory. This, > however, means that we now have field definitions that have same names as > other examples, but different definitions. > Importantly, Tika does not use most (any?) of those modified definitions. > They are there just for completeness. Similarly, the solrconfig.xml includes > extract handler even though we are demonstrating a different path of using > Tika. Somebody grepping through config files may get confused about what > configuration aspects contributes to what experience. > I am planning to significantly simplify configuration and schema of Tika > example to **only** show DIH Tika extraction path. It will end-up a very > short and focused example. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9601) DIH: Radicially simplify Tika example to only show relevant configuration
[ https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952457#comment-15952457 ] ASF subversion and git services commented on SOLR-9601: --- Commit 812b0eebf3d50a141b952af27bbf7c225df5072d in lucene-solr's branch refs/heads/branch_6x from [~arafalov] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=812b0ee ] SOLR-9601: DIH Tika example is now minimal. Only keep definitions and files required to show Tika-extraction in DIH > DIH: Radicially simplify Tika example to only show relevant configuration > - > > Key: SOLR-9601 > URL: https://issues.apache.org/jira/browse/SOLR-9601 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) >Affects Versions: 6.x, master (7.0) >Reporter: Alexandre Rafalovitch >Assignee: Alexandre Rafalovitch > Labels: examples, usability > Attachments: tika2_20170308.tgz, tika2_20170316.tgz > > > Solr DIH examples are legacy examples to show how DIH work. However, they > include full configurations that may obscure teaching points. This is no > longer needed as we have 3 full-blown examples in the configsets. > Specifically for Tika, the field types definitions were at some point > simplified to have less support files in the configuration directory. This, > however, means that we now have field definitions that have same names as > other examples, but different definitions. > Importantly, Tika does not use most (any?) of those modified definitions. > They are there just for completeness. Similarly, the solrconfig.xml includes > extract handler even though we are demonstrating a different path of using > Tika. Somebody grepping through config files may get confused about what > configuration aspects contributes to what experience. > I am planning to significantly simplify configuration and schema of Tika > example to **only** show DIH Tika extraction path. It will end-up a very > short and focused example. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9601) DIH: Radicially simplify Tika example to only show relevant configuration
[ https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952454#comment-15952454 ] ASF subversion and git services commented on SOLR-9601: --- Commit b02626de5071c543eb6e8deea450266218238c9e in lucene-solr's branch refs/heads/master from [~arafalov] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b02626d ] SOLR-9601: DIH Tika example is now minimal Only keep definitions and files required to show Tika-extraction in DIH > DIH: Radicially simplify Tika example to only show relevant configuration > - > > Key: SOLR-9601 > URL: https://issues.apache.org/jira/browse/SOLR-9601 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) >Affects Versions: 6.x, master (7.0) >Reporter: Alexandre Rafalovitch >Assignee: Alexandre Rafalovitch > Labels: examples, usability > Attachments: tika2_20170308.tgz, tika2_20170316.tgz > > > Solr DIH examples are legacy examples to show how DIH work. However, they > include full configurations that may obscure teaching points. This is no > longer needed as we have 3 full-blown examples in the configsets. > Specifically for Tika, the field types definitions were at some point > simplified to have less support files in the configuration directory. This, > however, means that we now have field definitions that have same names as > other examples, but different definitions. > Importantly, Tika does not use most (any?) of those modified definitions. > They are there just for completeness. Similarly, the solrconfig.xml includes > extract handler even though we are demonstrating a different path of using > Tika. Somebody grepping through config files may get confused about what > configuration aspects contributes to what experience. > I am planning to significantly simplify configuration and schema of Tika > example to **only** show DIH Tika extraction path. It will end-up a very > short and focused example. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-6.x-MacOSX (64bit/jdk1.8.0) - Build # 800 - Unstable!
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-MacOSX/800/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.cloud.ShardSplitTest.testSplitWithChaosMonkey Error Message: There are still nodes recoverying - waited for 330 seconds Stack Trace: java.lang.AssertionError: There are still nodes recoverying - waited for 330 seconds at __randomizedtesting.SeedInfo.seed([A7618497EB801B86:2C465746AA86B002]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:187) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:144) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:139) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForRecoveriesToFinish(AbstractFullDistribZkTestBase.java:865) at org.apache.solr.cloud.ShardSplitTest.testSplitWithChaosMonkey(ShardSplitTest.java:437) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:992) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:967) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3936 - Still Unstable!
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3936/ Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test Error Message: Could not find collection:collection2 Stack Trace: java.lang.AssertionError: Could not find collection:collection2 at __randomizedtesting.SeedInfo.seed([FEA3917DBE5D9B57:76F7AEA710A1F6AF]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:159) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:144) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:139) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForRecoveriesToFinish(AbstractFullDistribZkTestBase.java:870) at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.testIndexingBatchPerRequestWithHttpSolrClient(FullSolrCloudDistribCmdsTest.java:620) at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test(FullSolrCloudDistribCmdsTest.java:152) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (LUCENE-7745) Explore GPU acceleration
[ https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952388#comment-15952388 ] Ishan Chattopadhyaya commented on LUCENE-7745: -- Hi Vikash, I suggest you read the student manuals for GSoC. Submit a proposal how you want to approach this project, including technical details (as much as possible) and detailed timelines. Regarding the following: {code} 1First, understand how BooleanScorer calls these similarity classes and does the scoring. There are unit tests in Lucene that can help you get there. This might help: https://wiki.apache.org/lucene-java/HowToContribute 2Write a standalone CUDA/OpenCL project that does the same processing on the GPU. 3Benchmark the speed of doing so on GPU vs. speed observed when doing the same through the BooleanScorer. Preferably, on a large resultset. Include time for copying results and scores in and out of the device memory from/to the main memory. 4 Optimize step 2, if possible. {code} If you've already understood step 1, feel free to make a proposal on how you will use your GSoC coding time to achieve steps 2-4. Also, you can look at other stretch goals to be included in the coding time. I would consider that steps 2-4, if done properly and successfully, is itself a good GSoC contribution. And if these steps are done properly, then either Lucene integration can be proposed for the latter part of the coding phase (last 2-3 weeks, I'd think), or exploratory work on other part of Lucene (apart from the BooleanScorer, e.g. spatial search filtering etc.) could be taken up. Time is running out, so kindly submit a proposal as soon as possible. You can submit a draft first, have one of us review it and then submit it as final after the review. If the deadline is too close, there might not be enough time for this round of review, and in such a case just submit the draft as final. Also, remember a lot of the GPGPU coding is done on C, so familiarity/experience with that is a plus. (Just a suggestion that makes sense to me, and feel free to ignore: bullet points work better than long paragraphs, even though the length of sentences can remain the same) > Explore GPU acceleration > > > Key: LUCENE-7745 > URL: https://issues.apache.org/jira/browse/LUCENE-7745 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya > Labels: gsoc2017, mentor > > There are parts of Lucene that can potentially be speeded up if computations > were to be offloaded from CPU to the GPU(s). With commodity GPUs having as > high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to > speed parts of Lucene (indexing, search). > First that comes to mind is spatial filtering, which is traditionally known > to be a good candidate for GPU based speedup (esp. when complex polygons are > involved). In the past, Mike McCandless has mentioned that "both initial > indexing and merging are CPU/IO intensive, but they are very amenable to > soaking up the hardware's concurrency." > I'm opening this issue as an exploratory task, suitable for a GSoC project. I > volunteer to mentor any GSoC student willing to work on this this summer. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7745) Explore GPU acceleration
[ https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952375#comment-15952375 ] vikash commented on LUCENE-7745: Hello all, I have been reading a lot about GPU working and GPU parallelization in particularly about General Purpose computing on Graphics Processing Units and have also looked into in detail the source code of the BooleanScorer.java file , its a nice thing and I am having no difficulty understanding its working since Java is my speciality so the job was quite fun . There are a few things that seem unclear to me but I am reading and experimenting so I will resolve them soon. It is a nice idea to use gpu to perform the search and indexing operations on a document using the GPU and that would be faster using the GPU. And regarding the licencing issue, since we are generating code and as it was said above the code that we generate may not go to Lucene necessarily so assuming this happens then will licencing still be an issue if we use the libraries in our code? And as Uwe Schindler said we may host the code on github and certainly it would not be a good idea to develop code for special hardware, but still we can give it a try and try to make it compatible with most of the hardwares. I dont mind if this code does not go to Lucene, but we can try to change lucene and make it better and I am preparing myself for it and things would stay on track with your kind mentorship . So should I submit my proposal now or do I need to complete all the four steps that Ishaan told to do in the last comment and then submit my proposal? And which one of the ideas would you prefer to mentor me on that is which one do you think would be a better one to continue with? >Copy over and index lots of points and corresponding docids to the GPU as an >offline, one time operation. Then, given a query point, return top-n nearest >indexed points. >Copy over and index lots of points and corresponding docids to the GPU as an >offline, one time operation. Then, given a polygon (complex shape), return all >points that lie inside the polygon. >Benchmarking an aggregation over a DocValues field and comparing the >corresponding performance when executed on the GPU. >Benchmarking the speed of calculations on GPU vs. speed observed when doing >the same through the BooleanScorer. Preferably, on a large result set with the >time for copying results and scores in and out of the device memory from/to >the main memory included? -Vikash > Explore GPU acceleration > > > Key: LUCENE-7745 > URL: https://issues.apache.org/jira/browse/LUCENE-7745 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya > Labels: gsoc2017, mentor > > There are parts of Lucene that can potentially be speeded up if computations > were to be offloaded from CPU to the GPU(s). With commodity GPUs having as > high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to > speed parts of Lucene (indexing, search). > First that comes to mind is spatial filtering, which is traditionally known > to be a good candidate for GPU based speedup (esp. when complex polygons are > involved). In the past, Mike McCandless has mentioned that "both initial > indexing and merging are CPU/IO intensive, but they are very amenable to > soaking up the hardware's concurrency." > I'm opening this issue as an exploratory task, suitable for a GSoC project. I > volunteer to mentor any GSoC student willing to work on this this summer. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9745) SolrCLI swallows errors from solr.cmd
[ https://issues.apache.org/jira/browse/SOLR-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-9745: --- Summary: SolrCLI swallows errors from solr.cmd (was: bin/solr* swallows errors from running example instances at least) > SolrCLI swallows errors from solr.cmd > - > > Key: SOLR-9745 > URL: https://issues.apache.org/jira/browse/SOLR-9745 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 6.3, master (7.0) >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev > Labels: newbie, newdev > Attachments: SOLR-9745.patch, SOLR-9745.patch > > > It occurs on mad scenario in LUCENE-7534: > * solr.cmd weren't granted +x (it happens under cygwin, yes) > * coolhacker worked it around with cmd /C solr.cmd start -e .. > * but when SolrCLI runs solr instances with the same solr.cmd, it just > silently fails > I think we can just pass ExecuteResultHandler which will dump exception to > console. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-7383) DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo possible
[ https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandre Rafalovitch resolved SOLR-7383. - Resolution: Fixed Fix Version/s: 6.6 master (7.0) > DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo > possible > > > Key: SOLR-7383 > URL: https://issues.apache.org/jira/browse/SOLR-7383 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 5.0, 6.0 >Reporter: Upayavira >Assignee: Alexandre Rafalovitch >Priority: Minor > Fix For: master (7.0), 6.6 > > Attachments: atom_20170315.tgz, rss-data-config.xml, SOLR-7383.patch > > > The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) > is broken again. See associated issues. > Below is a config that should work. > This is caused by Slashdot seemingly oscillating between RDF/RSS and pure > RSS. Perhaps we should depend upon something more static, rather than an > external service that is free to change as it desires. > {code:xml} > > > > pk="link" > url="http://rss.slashdot.org/Slashdot/slashdot; > processor="XPathEntityProcessor" > forEach="/RDF/item" > transformer="DateFormatTransformer"> > > commonField="true" /> > commonField="true" /> > commonField="true" /> > > > > > > > dateTimeFormat="-MM-dd'T'HH:mm:ss" /> > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7383) DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo possible
[ https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952335#comment-15952335 ] ASF subversion and git services commented on SOLR-7383: --- Commit e987654aa31554fd27f3110d7def3eb782e5c199 in lucene-solr's branch refs/heads/branch_6x from [~arafalov] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e987654 ] SOLR-7383: Replace DIH 'rss' example with 'atom' rss example was broken for multiple reasons. atom example showcases the same - and more - features and uses the smallest config file needed to make it work. > DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo > possible > > > Key: SOLR-7383 > URL: https://issues.apache.org/jira/browse/SOLR-7383 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 5.0, 6.0 >Reporter: Upayavira >Assignee: Alexandre Rafalovitch >Priority: Minor > Attachments: atom_20170315.tgz, rss-data-config.xml, SOLR-7383.patch > > > The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) > is broken again. See associated issues. > Below is a config that should work. > This is caused by Slashdot seemingly oscillating between RDF/RSS and pure > RSS. Perhaps we should depend upon something more static, rather than an > external service that is free to change as it desires. > {code:xml} > > > > pk="link" > url="http://rss.slashdot.org/Slashdot/slashdot; > processor="XPathEntityProcessor" > forEach="/RDF/item" > transformer="DateFormatTransformer"> > > commonField="true" /> > commonField="true" /> > commonField="true" /> > > > > > > > dateTimeFormat="-MM-dd'T'HH:mm:ss" /> > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7383) DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo possible
[ https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952321#comment-15952321 ] ASF subversion and git services commented on SOLR-7383: --- Commit 580f6e98fb033dbbb8e0921fc3175021714ce956 in lucene-solr's branch refs/heads/master from [~arafalov] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=580f6e9 ] SOLR-7383: Replace DIH 'rss' example with 'atom' rss example was broken for multiple reasons. atom example showcases the same - and more - features and uses the smallest config file needed to make it work. > DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo > possible > > > Key: SOLR-7383 > URL: https://issues.apache.org/jira/browse/SOLR-7383 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 5.0, 6.0 >Reporter: Upayavira >Assignee: Alexandre Rafalovitch >Priority: Minor > Attachments: atom_20170315.tgz, rss-data-config.xml, SOLR-7383.patch > > > The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) > is broken again. See associated issues. > Below is a config that should work. > This is caused by Slashdot seemingly oscillating between RDF/RSS and pure > RSS. Perhaps we should depend upon something more static, rather than an > external service that is free to change as it desires. > {code:xml} > > > > pk="link" > url="http://rss.slashdot.org/Slashdot/slashdot; > processor="XPathEntityProcessor" > forEach="/RDF/item" > transformer="DateFormatTransformer"> > > commonField="true" /> > commonField="true" /> > commonField="true" /> > > > > > > > dateTimeFormat="-MM-dd'T'HH:mm:ss" /> > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests
[ https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952316#comment-15952316 ] Erick Erickson commented on SOLR-10229: --- bq: If you look closely at the public methods exposed to be used, all are static and h.getCore each time will fetch the current test-suites core and its schema, which is correct, no h.getCore() is overly restrictive and doesn't support having more than one core open and modifying the schema. The problem is it fetches _the_ test core which is limiting. It's convenient for writing tests that only operate on a single core. For more complex situations it's quite restrictive. Take a look at, for instance, TestLazyCores. It has to do some fancy dancing, but it opens multiple cores so it has to bypass h.getCore() completely. Admittedly they all use the same schema, but that doesn't matter since if I wanted each of those cores to have new field definitions I couldn't use h.getCore(), even implicitly. Even if all the new field definitions were the same. bq: ...different cores with different schemas in the same test in our test-suites... Are there such use cases? Not that I know of offhand, but that doesn't mean anything really, there's a _lot_ of test code ;). It's unnecessarily restrictive to confine ourselves into that paradigm though. And as above, using h.getCore() doesn't allow modifying schemas for more than one core in any given test. bq: I will do repetitive forced testing for two or more test suites simultaneously and observe what's happening. This isn't quite the issue. If we try to persist _anything_ to the "source tree", which includes all the config files in this case, the test framework should throw an exception. I'm not worried about multiple cores making modifications to the on-disk files, _no_ mods should be allowed unless the configs are in a temp dir. You'll see lots of code like (again from TestLazyCores since I know that code): solrHomeDirectory = createTempDir().toFile(); File coreRoot = new File(solrHomeDirectory, coreName); copyMinConf(coreRoot, "name=" + coreName); so having the temp dir (which is automagically cleaned up by the test harness) is required to change anything on-disk and just to use this new approach shouldn't require creating a tmp dir and copying stuff to it. > See what it would take to shift many of our one-off schemas used for testing > to managed schema and construct them as part of the tests > -- > > Key: SOLR-10229 > URL: https://issues.apache.org/jira/browse/SOLR-10229 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-10229.patch > > > The test schema files are intimidating. There are about a zillion of them, > and making a change in any of them risks breaking some _other_ test. That > leaves people three choices: > 1> add what they need to some existing schema. Which makes schemas bigger and > bigger and bigger. > 2> create a new schema file, adding to the proliferation thereof. > 3> Look through all the existing tests to see if they have something that > works. > The recent work on LUCENE-7705 is a case in point. We're adding a maxLen > parameter to some tokenizers. Putting those parameters into any of the > existing schemas, especially to test < 255 char tokens is virtually > guaranteed to break other tests, so the only safe thing to do is make another > schema file. Adding to the multiplication of files. > As part of SOLR-5260 I tried creating the schema on the fly rather than > creating a new static schema file and it's not hard. WDYT about making this > into some better thought-out utility? > At present, this is pretty fuzzy, I wanted to get some reactions before > putting much effort into it. I expect that the utility methods would > eventually get a bunch of canned types. It's reasonably straightforward for > primitive types, if lengthy. But when you get into solr.TextField-based types > it gets less straight-forward. > We could manage to just move the "intimidation" from the plethora of schema > files to a zillion fieldTypes in the utility to choose from... > Also, forcing every test to define the fields up-front is arguably less > convenient than just having _some_ canned schemas we can use. And erroneous > schemas to test failure modes are probably not very good fits for any such > framework. > [~steve_rowe] and [~hossman_luc...@fucit.org] in particular might have > something to say. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests
[ https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952298#comment-15952298 ] Amrit Sarkar commented on SOLR-10229: - Thank you for the correction and the suggestions. bq. It looks like you're thinking to have test classes subclass this. Could it be instantiated as a static member of SolrTestCaseJ4 somehow? I think that's less confusing and all current tests would immediately have access. The only thing I see on a quick glance that really requires SolrTestCaseJ4 is h.getCore(), so that would probably mean we need to pass the core in to the methods that need it. If you look closely at the public methods exposed to be used, all are static and h.getCore each time will fetch the current test-suites core and its schema, which is correct, no? Developers will directly access these methods without inheriting or creating framework object. bq. Using h.getCore() doesn't accommodate having different cores with different schemas in the same test. Not very much aware of above, _different cores with different schemas in the same test_ in our test-suites. Are there such use cases? I will look for them. bq. I doubt we should persist any changes. Makes sense. I will do repetitive forced testing for two or more test suites simultaneously and observe what's happening. Making necessary changes on the already and completing the rest, will update soon. > See what it would take to shift many of our one-off schemas used for testing > to managed schema and construct them as part of the tests > -- > > Key: SOLR-10229 > URL: https://issues.apache.org/jira/browse/SOLR-10229 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Minor > Attachments: SOLR-10229.patch > > > The test schema files are intimidating. There are about a zillion of them, > and making a change in any of them risks breaking some _other_ test. That > leaves people three choices: > 1> add what they need to some existing schema. Which makes schemas bigger and > bigger and bigger. > 2> create a new schema file, adding to the proliferation thereof. > 3> Look through all the existing tests to see if they have something that > works. > The recent work on LUCENE-7705 is a case in point. We're adding a maxLen > parameter to some tokenizers. Putting those parameters into any of the > existing schemas, especially to test < 255 char tokens is virtually > guaranteed to break other tests, so the only safe thing to do is make another > schema file. Adding to the multiplication of files. > As part of SOLR-5260 I tried creating the schema on the fly rather than > creating a new static schema file and it's not hard. WDYT about making this > into some better thought-out utility? > At present, this is pretty fuzzy, I wanted to get some reactions before > putting much effort into it. I expect that the utility methods would > eventually get a bunch of canned types. It's reasonably straightforward for > primitive types, if lengthy. But when you get into solr.TextField-based types > it gets less straight-forward. > We could manage to just move the "intimidation" from the plethora of schema > files to a zillion fieldTypes in the utility to choose from... > Also, forcing every test to define the fields up-front is arguably less > convenient than just having _some_ canned schemas we can use. And erroneous > schemas to test failure modes are probably not very good fits for any such > framework. > [~steve_rowe] and [~hossman_luc...@fucit.org] in particular might have > something to say. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP
[ https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952283#comment-15952283 ] Joel Bernstein edited comment on SOLR-10351 at 4/1/17 3:46 PM: --- bq. Wouldn't the NLP processing as advertised in the title of this issue be most likely to put it's processing into analysis attributes? This stream evaluator only emits the character data attribute. Possibly. I definitely have much to learn about the analysis chain. In the first pass I was mostly interested in getting the token stream from the analysis chain. What I had envisioned in the future was having analysis chains that perform sentence chunking, entity extraction, noun phrase extraction etc... I was seeing these as a finished token streams. But exposing the analysis attributes would seem to make sense in the future. bq. BTW Please use try-finally (even try-with-resources style) to close token-streams wherever possible. Analyzer internal parts are internally shared in thread-locals and the ramifications can be nasty on the entire Solr node if at any time one filter has a bug or something on a particular value. Your Solr node then becomes poisoned in a sense and only a restart will fix the ailment. Will do. was (Author: joel.bernstein): bq. Wouldn't the NLP processing as advertised in the title of this issue be most likely to put it's processing into analysis attributes? This stream evaluator only emits the character data attribute. Possibly. I definitely have much to learn about the analysis chain. In the first pass I was mostly interested in getting the token stream from the analysis chain. What I had envisioned in the future was having token streams that perform sentence chunking, entity extraction, noun phrase extraction etc... I was seeing these as a finished token streams. But exposing the analysis attributes would seem to make sense in the future. bq. BTW Please use try-finally (even try-with-resources style) to close token-streams wherever possible. Analyzer internal parts are internally shared in thread-locals and the ramifications can be nasty on the entire Solr node if at any time one filter has a bug or something on a particular value. Your Solr node then becomes poisoned in a sense and only a restart will fix the ailment. Will do. > Add analyze Stream Evaluator to support streaming NLP > - > > Key: SOLR-10351 > URL: https://issues.apache.org/jira/browse/SOLR-10351 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Labels: NLP, Streaming > Fix For: 6.6 > > Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, > SOLR-10351.patch > > > The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of > tokens from a *text field*. The collection of tokens can then be streamed out > by the *cartesianProduct* Streaming Expression or attached to documents as > multi-valued fields by the *select* Streaming Expression. > This allows Streaming Expressions to leverage all the existing tokenizers and > filters and provides a place for future NLP analyzers to be added to > Streaming Expressions. > Sample syntax: > {code} > cartesianProduct(expr, analyze(analyzerField, textField) as outfield ) > {code} > {code} > select(expr, analyze(analyzerField, textField) as outfield ) > {code} > Combined with Solr's batch text processing capabilities this provides an > entire parallel NLP framework. Solr's batch processing capabilities are > described here: > *Batch jobs, Parallel ETL and Streaming Text Transformation* > http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP
[ https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952283#comment-15952283 ] Joel Bernstein commented on SOLR-10351: --- bq. Wouldn't the NLP processing as advertised in the title of this issue be most likely to put it's processing into analysis attributes? This stream evaluator only emits the character data attribute. Possibly. I definitely have much to learn about the analysis chain. In the first pass I was mostly interested in getting the token stream from the analysis chain. What I had envisioned in the future was having token streams that perform sentence chunking, entity extraction, noun phrase extraction etc... I was seeing these as a finished token streams. But exposing the analysis attributes would seem to make sense in the future. bq. BTW Please use try-finally (even try-with-resources style) to close token-streams wherever possible. Analyzer internal parts are internally shared in thread-locals and the ramifications can be nasty on the entire Solr node if at any time one filter has a bug or something on a particular value. Your Solr node then becomes poisoned in a sense and only a restart will fix the ailment. Will do. > Add analyze Stream Evaluator to support streaming NLP > - > > Key: SOLR-10351 > URL: https://issues.apache.org/jira/browse/SOLR-10351 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Labels: NLP, Streaming > Fix For: 6.6 > > Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, > SOLR-10351.patch > > > The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of > tokens from a *text field*. The collection of tokens can then be streamed out > by the *cartesianProduct* Streaming Expression or attached to documents as > multi-valued fields by the *select* Streaming Expression. > This allows Streaming Expressions to leverage all the existing tokenizers and > filters and provides a place for future NLP analyzers to be added to > Streaming Expressions. > Sample syntax: > {code} > cartesianProduct(expr, analyze(analyzerField, textField) as outfield ) > {code} > {code} > select(expr, analyze(analyzerField, textField) as outfield ) > {code} > Combined with Solr's batch text processing capabilities this provides an > entire parallel NLP framework. Solr's batch processing capabilities are > described here: > *Batch jobs, Parallel ETL and Streaming Text Transformation* > http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP
[ https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952245#comment-15952245 ] David Smiley commented on SOLR-10351: - Wouldn't the NLP processing as advertised in the title of this issue be most likely to put it's processing into analysis _attributes_? This stream evaluator only emits the character data attribute. BTW Please use try-finally (even try-with-resources style) to close token-streams wherever possible. Analyzer internal parts are internally shared in thread-locals and the ramifications can be nasty on the entire Solr node if at any time one filter has a bug or something on a particular value. Your Solr node then becomes poisoned in a sense and only a restart will fix the ailment. > Add analyze Stream Evaluator to support streaming NLP > - > > Key: SOLR-10351 > URL: https://issues.apache.org/jira/browse/SOLR-10351 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Labels: NLP, Streaming > Fix For: 6.6 > > Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, > SOLR-10351.patch > > > The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of > tokens from a *text field*. The collection of tokens can then be streamed out > by the *cartesianProduct* Streaming Expression or attached to documents as > multi-valued fields by the *select* Streaming Expression. > This allows Streaming Expressions to leverage all the existing tokenizers and > filters and provides a place for future NLP analyzers to be added to > Streaming Expressions. > Sample syntax: > {code} > cartesianProduct(expr, analyze(analyzerField, textField) as outfield ) > {code} > {code} > select(expr, analyze(analyzerField, textField) as outfield ) > {code} > Combined with Solr's batch text processing capabilities this provides an > entire parallel NLP framework. Solr's batch processing capabilities are > described here: > *Batch jobs, Parallel ETL and Streaming Text Transformation* > http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3935 - Still Unstable!
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3935/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC 8 tests failed. FAILED: org.apache.solr.cloud.CustomCollectionTest.testRouteFieldForHashRouter Error Message: Collection not found: routeFieldColl Stack Trace: org.apache.solr.common.SolrException: Collection not found: routeFieldColl at __randomizedtesting.SeedInfo.seed([6DD5429E60B0C9D4:C5E3DC43FFD1228E]:0) at org.apache.solr.client.solrj.impl.CloudSolrClient.getCollectionNames(CloudSolrClient.java:1382) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1075) at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1054) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160) at org.apache.solr.client.solrj.request.UpdateRequest.commit(UpdateRequest.java:233) at org.apache.solr.cloud.CustomCollectionTest.testRouteFieldForHashRouter(CustomCollectionTest.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at