[jira] [Commented] (SOLR-10393) Add UUID Stream Evaluator

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952512#comment-15952512
 ] 

ASF subversion and git services commented on SOLR-10393:


Commit 7e8272c89ec42519894b64c0ac576a1a2889bd32 in lucene-solr's branch 
refs/heads/branch_6x from [~dpgove]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e8272c ]

SOLR-10393: Adds UUID Streaming Evaluator


> Add UUID Stream Evaluator
> -
>
> Key: SOLR-10393
> URL: https://issues.apache.org/jira/browse/SOLR-10393
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Attachments: SOLR-10393.patch, SOLR-10393.patch
>
>
> The cartesianProduct function emits multiple tuples from a single tuple. To 
> save the cartesian product in another collection it would be useful to be 
> able to dynamically assign new unique id's to tuples. The uuid() stream 
> evaluator will allow us to do this.
> sample syntax:
> {code}
> cartesianProduct(expr, fielda, uuid() as id)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10356) Add Streaming Evaluators for basic math functions

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952511#comment-15952511
 ] 

ASF subversion and git services commented on SOLR-10356:


Commit 6ce02bc693d4ef67872e9c536155c5308227d6e9 in lucene-solr's branch 
refs/heads/branch_6x from [~dpgove]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6ce02bc ]

SOLR-10356: Adds basic math streaming evaluators


> Add Streaming Evaluators for basic math functions
> -
>
> Key: SOLR-10356
> URL: https://issues.apache.org/jira/browse/SOLR-10356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Attachments: SOLR-10356.patch, SOLR-10356.patch, SOLR-10356.patch, 
> SOLR-10356.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10329) Rebuild Solr examples

2017-04-01 Thread Avtar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952513#comment-15952513
 ] 

Avtar Singh commented on SOLR-10329:


hello sir, 
My name is Avtar Singh, I have previously developed  a fact based question 
answering system based on Apache Solr and Apache Lucene. I would love to work 
on the project, I believe that I can do the project very efficiently.
Thank you

> Rebuild Solr examples
> -
>
> Key: SOLR-10329
> URL: https://issues.apache.org/jira/browse/SOLR-10329
> Project: Solr
>  Issue Type: Wish
>  Components: examples
>Reporter: Alexandre Rafalovitch
>  Labels: gsoc2017
>
> Apache Solr ships with a number of examples. They evolved from a kitchen sync 
> example and are rather large. When new Solr features are added, they are 
> often shoehorned into the most appropriate example and sometimes are not 
> represented at all. 
> Often, for new users, it is hard to tell what part of example is relevant, 
> what part is default and what part is demonstrating something completely 
> different.
> It would take significant (and very appreciated) effort to review all the 
> examples and rebuild them to provide clean way to showcase best practices 
> around base and most recent features.
> Specific issues are around kitchen sync vs. minimal examples, better approach 
> to "schemaless" mode and creating examples and datasets that allow to create 
> both "hello world" and more-advanced tutorials.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10393) Add UUID Stream Evaluator

2017-04-01 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-10393:
---
Attachment: SOLR-10393.patch

> Add UUID Stream Evaluator
> -
>
> Key: SOLR-10393
> URL: https://issues.apache.org/jira/browse/SOLR-10393
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Attachments: SOLR-10393.patch, SOLR-10393.patch
>
>
> The cartesianProduct function emits multiple tuples from a single tuple. To 
> save the cartesian product in another collection it would be useful to be 
> able to dynamically assign new unique id's to tuples. The uuid() stream 
> evaluator will allow us to do this.
> sample syntax:
> {code}
> cartesianProduct(expr, fielda, uuid() as id)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10393) Add UUID Stream Evaluator

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952503#comment-15952503
 ] 

ASF subversion and git services commented on SOLR-10393:


Commit ef821834d15194c2c8b626d494b5119dd42b4f9f in lucene-solr's branch 
refs/heads/master from [~dpgove]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ef82183 ]

SOLR-10393: Adds UUID Streaming Evaluator


> Add UUID Stream Evaluator
> -
>
> Key: SOLR-10393
> URL: https://issues.apache.org/jira/browse/SOLR-10393
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Attachments: SOLR-10393.patch
>
>
> The cartesianProduct function emits multiple tuples from a single tuple. To 
> save the cartesian product in another collection it would be useful to be 
> able to dynamically assign new unique id's to tuples. The uuid() stream 
> evaluator will allow us to do this.
> sample syntax:
> {code}
> cartesianProduct(expr, fielda, uuid() as id)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10356) Add Streaming Evaluators for basic math functions

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952502#comment-15952502
 ] 

ASF subversion and git services commented on SOLR-10356:


Commit 674ce4e89393efe3147629e76f053c9901c182dc in lucene-solr's branch 
refs/heads/master from [~dpgove]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=674ce4e ]

SOLR-10356: Adds basic math streaming evaluators


> Add Streaming Evaluators for basic math functions
> -
>
> Key: SOLR-10356
> URL: https://issues.apache.org/jira/browse/SOLR-10356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Attachments: SOLR-10356.patch, SOLR-10356.patch, SOLR-10356.patch, 
> SOLR-10356.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7729) Support for string type separator for CustomSeparatorBreakIterator

2017-04-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952490#comment-15952490
 ] 

Amrit Sarkar commented on LUCENE-7729:
--

:)

I looked into the SimplePatternTokenizer and how it does the pattern matching 
utilising finite-state deterministic automata. CharacterRunAutomaton is the one 
fundamental for the hypothetical PatternBreakIterator. It should not be much 
work considering everything has been implemented very extensively and 
SimplePatternTokenizer provides a perfect example. I will try to devise 
something out of it and update soon.

> Support for string type separator for CustomSeparatorBreakIterator
> --
>
> Key: LUCENE-7729
> URL: https://issues.apache.org/jira/browse/LUCENE-7729
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Amrit Sarkar
> Attachments: LUCENE-7729.patch, LUCENE-7729.patch
>
>
> LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the 
> _char_ passed is found.
> Improved CustomSeparatorBreakIterator; as it now supports separator of string 
> type of arbitrary length.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7729) Support for string type separator for CustomSeparatorBreakIterator

2017-04-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937734#comment-15937734
 ] 

Amrit Sarkar edited comment on LUCENE-7729 at 4/1/17 11:56 PM:
---

bq. len > 0 (as a comment) but in all cases you probably mean len > 1?
Yes, that is correct.

bq. Let me give a better example of length 3: aab would fail to match aaab. I 
just wrote a test for that to confirm it failed. Here's another example of 
length 4 that may be more clear: A separator of acab would fail to be detected 
in acacab.
I see. The implemented is flawed, the algorithm I thought is incomplete and 
though some minor tweaking will make it work surely. I never considered 
repetitive pattern in the separator.

bq.  To be clear, I never asked or recommended. 
David, I completely understand and aware, I just pointed out the conversation 
which motivates me to look into it. I am thankful to you for taking your time 
out to provide healthy insights and feedback on the patch. I will not get 
discouraged if some of my work doesn't get into the main project, even I want 
to contribute which is useful not flawed.

With that, I will check out SimplePatternTokenizer and the Automaton part. 
Thank you for your time again, really appreciate that. Should I leave this JIRA 
as it is? or instead atleast fix the implementation?


was (Author: sarkaramr...@gmail.com):
bq. len > 0 (as a comment) but in all cases you probably mean len > 1?
Yes, that is correct.

bq. Let me give a better example of length 3: aab would fail to match aaab. I 
just wrote a test for that to confirm it failed. Here's another example of 
length 4 that may be more clear: A separator of acab would fail to be detected 
in acacab.
I see. The implemented is flawed, the algorithm I thought is incomplete and 
though some minor tweaking will make it work surely. I never considered 
repetitive pattern in the separator.

bq.  To be clear, I never asked or recommended. 
David, I completely understand and aware, I just pointed out the conversation 
which motivates me to look into it. I am thankful to you for taking your time 
out to provide healthy insights and feedback on the patch. I will not get 
discouraged if some of my work doesn't get into the main project, even I want 
to contribute which is useful not flawed.

With that, I will check out SimplePatternTokenizer and the Automation part. 
Thank you for your time again, really appreciate that. Should I leave this JIRA 
as it is? or instead atleast fix the implementation?

> Support for string type separator for CustomSeparatorBreakIterator
> --
>
> Key: LUCENE-7729
> URL: https://issues.apache.org/jira/browse/LUCENE-7729
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Amrit Sarkar
> Attachments: LUCENE-7729.patch, LUCENE-7729.patch
>
>
> LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the 
> _char_ passed is found.
> Improved CustomSeparatorBreakIterator; as it now supports separator of string 
> type of arbitrary length.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9601) DIH: Radicially simplify Tika example to only show relevant configuration

2017-04-01 Thread Alexandre Rafalovitch (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch resolved SOLR-9601.
-
   Resolution: Fixed
Fix Version/s: 6.6
   master (7.0)

> DIH: Radicially simplify Tika example to only show relevant configuration
> -
>
> Key: SOLR-9601
> URL: https://issues.apache.org/jira/browse/SOLR-9601
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>Affects Versions: 6.x, master (7.0)
>Reporter: Alexandre Rafalovitch
>Assignee: Alexandre Rafalovitch
>  Labels: examples, usability
> Fix For: master (7.0), 6.6
>
> Attachments: tika2_20170308.tgz, tika2_20170316.tgz
>
>
> Solr DIH examples are legacy examples to show how DIH work. However, they 
> include full configurations that may obscure teaching points. This is no 
> longer needed as we have 3 full-blown examples in the configsets. 
> Specifically for Tika, the field types definitions were at some point 
> simplified to have less support files in the configuration directory. This, 
> however, means that we now have field definitions that have same names as 
> other examples, but different definitions. 
> Importantly, Tika does not use most (any?) of those modified definitions. 
> They are there just for completeness. Similarly, the solrconfig.xml includes 
> extract handler even though we are demonstrating a different path of using 
> Tika. Somebody grepping through config files may get confused about what 
> configuration aspects contributes to what experience.
> I am planning to significantly simplify configuration and schema of Tika 
> example to **only** show DIH Tika extraction path. It will end-up a very 
> short and focused example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9601) DIH: Radicially simplify Tika example to only show relevant configuration

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952457#comment-15952457
 ] 

ASF subversion and git services commented on SOLR-9601:
---

Commit 812b0eebf3d50a141b952af27bbf7c225df5072d in lucene-solr's branch 
refs/heads/branch_6x from [~arafalov]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=812b0ee ]

SOLR-9601: DIH Tika example is now minimal.
Only keep definitions and files required to show Tika-extraction in DIH


> DIH: Radicially simplify Tika example to only show relevant configuration
> -
>
> Key: SOLR-9601
> URL: https://issues.apache.org/jira/browse/SOLR-9601
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>Affects Versions: 6.x, master (7.0)
>Reporter: Alexandre Rafalovitch
>Assignee: Alexandre Rafalovitch
>  Labels: examples, usability
> Attachments: tika2_20170308.tgz, tika2_20170316.tgz
>
>
> Solr DIH examples are legacy examples to show how DIH work. However, they 
> include full configurations that may obscure teaching points. This is no 
> longer needed as we have 3 full-blown examples in the configsets. 
> Specifically for Tika, the field types definitions were at some point 
> simplified to have less support files in the configuration directory. This, 
> however, means that we now have field definitions that have same names as 
> other examples, but different definitions. 
> Importantly, Tika does not use most (any?) of those modified definitions. 
> They are there just for completeness. Similarly, the solrconfig.xml includes 
> extract handler even though we are demonstrating a different path of using 
> Tika. Somebody grepping through config files may get confused about what 
> configuration aspects contributes to what experience.
> I am planning to significantly simplify configuration and schema of Tika 
> example to **only** show DIH Tika extraction path. It will end-up a very 
> short and focused example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9601) DIH: Radicially simplify Tika example to only show relevant configuration

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952454#comment-15952454
 ] 

ASF subversion and git services commented on SOLR-9601:
---

Commit b02626de5071c543eb6e8deea450266218238c9e in lucene-solr's branch 
refs/heads/master from [~arafalov]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b02626d ]

SOLR-9601: DIH Tika example is now minimal
Only keep definitions and files required to show Tika-extraction in DIH


> DIH: Radicially simplify Tika example to only show relevant configuration
> -
>
> Key: SOLR-9601
> URL: https://issues.apache.org/jira/browse/SOLR-9601
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>Affects Versions: 6.x, master (7.0)
>Reporter: Alexandre Rafalovitch
>Assignee: Alexandre Rafalovitch
>  Labels: examples, usability
> Attachments: tika2_20170308.tgz, tika2_20170316.tgz
>
>
> Solr DIH examples are legacy examples to show how DIH work. However, they 
> include full configurations that may obscure teaching points. This is no 
> longer needed as we have 3 full-blown examples in the configsets. 
> Specifically for Tika, the field types definitions were at some point 
> simplified to have less support files in the configuration directory. This, 
> however, means that we now have field definitions that have same names as 
> other examples, but different definitions. 
> Importantly, Tika does not use most (any?) of those modified definitions. 
> They are there just for completeness. Similarly, the solrconfig.xml includes 
> extract handler even though we are demonstrating a different path of using 
> Tika. Somebody grepping through config files may get confused about what 
> configuration aspects contributes to what experience.
> I am planning to significantly simplify configuration and schema of Tika 
> example to **only** show DIH Tika extraction path. It will end-up a very 
> short and focused example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-MacOSX (64bit/jdk1.8.0) - Build # 800 - Unstable!

2017-04-01 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-MacOSX/800/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.cloud.ShardSplitTest.testSplitWithChaosMonkey

Error Message:
There are still nodes recoverying - waited for 330 seconds

Stack Trace:
java.lang.AssertionError: There are still nodes recoverying - waited for 330 
seconds
at 
__randomizedtesting.SeedInfo.seed([A7618497EB801B86:2C465746AA86B002]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:187)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:144)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:139)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForRecoveriesToFinish(AbstractFullDistribZkTestBase.java:865)
at 
org.apache.solr.cloud.ShardSplitTest.testSplitWithChaosMonkey(ShardSplitTest.java:437)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:992)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:967)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
   

[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3936 - Still Unstable!

2017-04-01 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3936/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test

Error Message:
Could not find collection:collection2

Stack Trace:
java.lang.AssertionError: Could not find collection:collection2
at 
__randomizedtesting.SeedInfo.seed([FEA3917DBE5D9B57:76F7AEA710A1F6AF]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:159)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:144)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:139)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForRecoveriesToFinish(AbstractFullDistribZkTestBase.java:870)
at 
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.testIndexingBatchPerRequestWithHttpSolrClient(FullSolrCloudDistribCmdsTest.java:620)
at 
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test(FullSolrCloudDistribCmdsTest.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-01 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952388#comment-15952388
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Hi Vikash,
I suggest you read the student manuals for GSoC.
Submit a proposal how you want to approach this project, including technical 
details (as much as possible) and detailed timelines.

Regarding the following:

{code}
1First, understand how BooleanScorer calls these similarity classes and 
does the scoring. There are unit tests in Lucene that can help you get there. 
This might help: https://wiki.apache.org/lucene-java/HowToContribute
2Write a standalone CUDA/OpenCL project that does the same processing on 
the GPU.
3Benchmark the speed of doing so on GPU vs. speed observed when doing the 
same through the BooleanScorer. Preferably, on a large resultset. Include time 
for copying results and scores in and out of the device memory from/to the main 
memory.
 4   Optimize step 2, if possible.
{code}

If you've already understood step 1, feel free to make a proposal on how you 
will use your GSoC coding time to achieve steps 2-4. Also, you can look at 
other stretch goals to be included in the coding time. I would consider that 
steps 2-4, if done properly and successfully, is itself a good GSoC 
contribution. And if these steps are done properly, then either Lucene 
integration can be proposed for the latter part of the coding phase (last 2-3 
weeks, I'd think), or exploratory work on other part of Lucene (apart from the 
BooleanScorer, e.g. spatial search filtering etc.) could be taken up. 

Time is running out, so kindly submit a proposal as soon as possible. You can 
submit a draft first, have one of us review it and then submit it as final 
after the review. If the deadline is too close, there might not be enough time 
for this round of review, and in such a case just submit the draft as final.

Also, remember a lot of the GPGPU coding is done on C, so 
familiarity/experience with that is a plus.

(Just a suggestion that makes sense to me, and feel free to ignore: bullet 
points work better than long paragraphs, even though the length of sentences 
can remain the same)

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-01 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952375#comment-15952375
 ] 

vikash commented on LUCENE-7745:


Hello all, 
I have been reading a lot about GPU working and GPU parallelization in 
particularly about General Purpose computing on Graphics Processing Units and 
have also looked into in detail the source code of the BooleanScorer.java file 
, its a nice thing and I am having no difficulty understanding its working 
since Java is my speciality so the job was quite fun . There are a few things 
that seem unclear to me but I am reading and experimenting so I will resolve 
them soon.  It is a nice idea to use gpu to perform the search and indexing 
operations on a document using the GPU and that would be faster using the GPU. 
And regarding the licencing issue, since we are generating code and as it was 
said above the code that we generate may not go to Lucene necessarily so 
assuming this happens then will licencing still be an issue if we use the 
libraries in our code? And as Uwe Schindler  said we may host the code on 
github and certainly it would not be a good idea to develop code for special 
hardware, but still we can give it a try and try to make it compatible with 
most of the hardwares. I dont mind if this code does not go to Lucene, but we 
can try to change lucene and make it better and I am preparing myself for it 
and things would stay on track with your kind mentorship .
So should I submit my proposal now or do I need to complete all the four steps 
that Ishaan told to do in the last comment and then submit my proposal? And 
which one of the ideas would you prefer to mentor me on that is which one do 
you think would be a better one to continue with? 

>Copy over and index lots of points and corresponding docids to the GPU as an 
>offline, one time operation. Then, given a query point, return top-n nearest 
>indexed points.
>Copy over and index lots of points and corresponding docids to the GPU as an 
>offline, one time operation. Then, given a polygon (complex shape), return all 
>points that lie inside the polygon.
>Benchmarking an aggregation over a DocValues field  and comparing the 
>corresponding performance when executed on the GPU. 
>Benchmarking the speed of calculations on GPU vs. speed observed when doing 
>the same through the BooleanScorer. Preferably, on a large result set with the 
>time for copying results and scores in and out of the device memory from/to 
>the main memory included?
-Vikash

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9745) SolrCLI swallows errors from solr.cmd

2017-04-01 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-9745:
---
Summary: SolrCLI swallows errors from solr.cmd  (was: bin/solr* swallows 
errors from running example instances at least)

> SolrCLI swallows errors from solr.cmd
> -
>
> Key: SOLR-9745
> URL: https://issues.apache.org/jira/browse/SOLR-9745
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 6.3, master (7.0)
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>  Labels: newbie, newdev
> Attachments: SOLR-9745.patch, SOLR-9745.patch
>
>
> It occurs on mad scenario in LUCENE-7534:
> * solr.cmd weren't granted +x (it happens under cygwin, yes)
> * coolhacker worked it around with cmd /C solr.cmd start -e ..
> * but when SolrCLI runs solr instances with the same solr.cmd, it just 
> silently fails
> I think we can just pass ExecuteResultHandler which will dump exception to 
> console. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-7383) DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo possible

2017-04-01 Thread Alexandre Rafalovitch (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch resolved SOLR-7383.
-
   Resolution: Fixed
Fix Version/s: 6.6
   master (7.0)

> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo 
> possible
> 
>
> Key: SOLR-7383
> URL: https://issues.apache.org/jira/browse/SOLR-7383
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 5.0, 6.0
>Reporter: Upayavira
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Fix For: master (7.0), 6.6
>
> Attachments: atom_20170315.tgz, rss-data-config.xml, SOLR-7383.patch
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) 
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure 
> RSS. Perhaps we should depend upon something more static, rather than an 
> external service that is free to change as it desires.
> {code:xml}
> 
> 
> 
>  pk="link"
> url="http://rss.slashdot.org/Slashdot/slashdot;
> processor="XPathEntityProcessor"
> forEach="/RDF/item"
> transformer="DateFormatTransformer">
>   
>  commonField="true" />
>  commonField="true" />
>  commonField="true" />
>   
> 
> 
> 
> 
> 
>  dateTimeFormat="-MM-dd'T'HH:mm:ss" />
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7383) DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo possible

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952335#comment-15952335
 ] 

ASF subversion and git services commented on SOLR-7383:
---

Commit e987654aa31554fd27f3110d7def3eb782e5c199 in lucene-solr's branch 
refs/heads/branch_6x from [~arafalov]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e987654 ]

SOLR-7383: Replace DIH 'rss' example with 'atom'
rss example was broken for multiple reasons.
atom example showcases the same - and more - features
and uses the smallest config file needed to make it work.


> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo 
> possible
> 
>
> Key: SOLR-7383
> URL: https://issues.apache.org/jira/browse/SOLR-7383
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 5.0, 6.0
>Reporter: Upayavira
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Attachments: atom_20170315.tgz, rss-data-config.xml, SOLR-7383.patch
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) 
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure 
> RSS. Perhaps we should depend upon something more static, rather than an 
> external service that is free to change as it desires.
> {code:xml}
> 
> 
> 
>  pk="link"
> url="http://rss.slashdot.org/Slashdot/slashdot;
> processor="XPathEntityProcessor"
> forEach="/RDF/item"
> transformer="DateFormatTransformer">
>   
>  commonField="true" />
>  commonField="true" />
>  commonField="true" />
>   
> 
> 
> 
> 
> 
>  dateTimeFormat="-MM-dd'T'HH:mm:ss" />
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7383) DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo possible

2017-04-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952321#comment-15952321
 ] 

ASF subversion and git services commented on SOLR-7383:
---

Commit 580f6e98fb033dbbb8e0921fc3175021714ce956 in lucene-solr's branch 
refs/heads/master from [~arafalov]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=580f6e9 ]

SOLR-7383: Replace DIH 'rss' example with 'atom'
rss example was broken for multiple reasons.
atom example showcases the same - and more - features
and uses the smallest config file needed to make it work.


> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo 
> possible
> 
>
> Key: SOLR-7383
> URL: https://issues.apache.org/jira/browse/SOLR-7383
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 5.0, 6.0
>Reporter: Upayavira
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Attachments: atom_20170315.tgz, rss-data-config.xml, SOLR-7383.patch
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) 
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure 
> RSS. Perhaps we should depend upon something more static, rather than an 
> external service that is free to change as it desires.
> {code:xml}
> 
> 
> 
>  pk="link"
> url="http://rss.slashdot.org/Slashdot/slashdot;
> processor="XPathEntityProcessor"
> forEach="/RDF/item"
> transformer="DateFormatTransformer">
>   
>  commonField="true" />
>  commonField="true" />
>  commonField="true" />
>   
> 
> 
> 
> 
> 
>  dateTimeFormat="-MM-dd'T'HH:mm:ss" />
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

2017-04-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952316#comment-15952316
 ] 

Erick Erickson commented on SOLR-10229:
---

bq: If you look closely at the public methods exposed to be used, all are 
static and h.getCore each time will fetch the current test-suites core and its 
schema, which is correct, no

h.getCore() is overly restrictive and doesn't support having more than one core 
open and modifying the schema. The problem is it fetches _the_ test core which 
is limiting. It's convenient for writing tests that only operate on a single 
core. For more complex situations it's quite restrictive.

Take a look at, for instance, TestLazyCores. It has to do some fancy dancing, 
but it opens multiple cores so it has to bypass h.getCore() completely. 
Admittedly they all use the same schema, but that doesn't matter since if I 
wanted each of those cores to have new field definitions I couldn't use 
h.getCore(), even implicitly. Even if all the new field definitions were the 
same.

bq: ...different cores with different schemas in the same test in our 
test-suites... Are there such use cases? 

Not that I know of offhand, but that doesn't mean anything really, there's a 
_lot_ of test code ;). It's unnecessarily restrictive to confine ourselves into 
that paradigm though. And as above, using h.getCore() doesn't allow modifying 
schemas for more than one core in any given test.

bq:  I will do repetitive forced testing for two or more test suites 
simultaneously and observe what's happening.

This isn't quite the issue. If we try to persist _anything_ to the "source 
tree", which includes all the config files in this case, the test framework 
should throw an exception. I'm not worried about multiple cores making 
modifications to the on-disk files, _no_ mods should be allowed unless the 
configs are in a temp dir. You'll see lots of code like (again from 
TestLazyCores since I know that code):

solrHomeDirectory = createTempDir().toFile();
File coreRoot = new File(solrHomeDirectory, coreName);
copyMinConf(coreRoot, "name=" + coreName);

so having the temp dir (which is automagically cleaned up by the test harness) 
is required to change anything on-disk and just to use this new approach 
shouldn't require creating a tmp dir and copying stuff to it.

> See what it would take to shift many of our one-off schemas used for testing 
> to managed schema and construct them as part of the tests
> --
>
> Key: SOLR-10229
> URL: https://issues.apache.org/jira/browse/SOLR-10229
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: SOLR-10229.patch
>
>
> The test schema files are intimidating. There are about a zillion of them, 
> and making a change in any of them risks breaking some _other_ test. That 
> leaves people three choices:
> 1> add what they need to some existing schema. Which makes schemas bigger and 
> bigger and bigger.
> 2> create a new schema file, adding to the proliferation thereof.
> 3> Look through all the existing tests to see if they have something that 
> works.
> The recent work on LUCENE-7705 is a case in point. We're adding a maxLen 
> parameter to some tokenizers. Putting those parameters into any of the 
> existing schemas, especially to test < 255 char tokens is virtually 
> guaranteed to break other tests, so the only safe thing to do is make another 
> schema file. Adding to the multiplication of files.
> As part of SOLR-5260 I tried creating the schema on the fly rather than 
> creating a new static schema file and it's not hard. WDYT about making this 
> into some better thought-out utility? 
> At present, this is pretty fuzzy, I wanted to get some reactions before 
> putting much effort into it. I expect that the utility methods would 
> eventually get a bunch of canned types. It's reasonably straightforward for 
> primitive types, if lengthy. But when you get into solr.TextField-based types 
> it gets less straight-forward.
> We could manage to just move the "intimidation" from the plethora of schema 
> files to a zillion fieldTypes in the utility to choose from...
> Also, forcing every test to define the fields up-front is arguably less 
> convenient than just having _some_ canned schemas we can use. And erroneous 
> schemas to test failure modes are probably not very good fits for any such 
> framework.
> [~steve_rowe] and [~hossman_luc...@fucit.org] in particular might have 
> something to say.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

2017-04-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952298#comment-15952298
 ] 

Amrit Sarkar commented on SOLR-10229:
-


Thank you for the correction and the suggestions.

bq. It looks like you're thinking to have test classes subclass this. Could it 
be instantiated as a static member of SolrTestCaseJ4 somehow? I think that's 
less confusing and all current tests would immediately have access. The only 
thing I see on a quick glance that really requires SolrTestCaseJ4 is 
h.getCore(), so that would probably mean we need to pass the core in to the 
methods that need it.

If you look closely at the public methods exposed to be used, all are static 
and h.getCore each time will fetch the current test-suites core and its schema, 
which is correct, no? Developers will directly access these methods without 
inheriting or creating framework object. 

bq. Using h.getCore() doesn't accommodate having different cores with different 
schemas in the same test.

Not very much aware of above, _different cores with different schemas in the 
same test_ in our test-suites. Are there such use cases? I will look for them.

bq. I doubt we should persist any changes.

Makes sense. I will do repetitive forced testing for two or more test suites 
simultaneously and observe what's happening.

Making necessary changes on the already and completing the rest, will update 
soon.




> See what it would take to shift many of our one-off schemas used for testing 
> to managed schema and construct them as part of the tests
> --
>
> Key: SOLR-10229
> URL: https://issues.apache.org/jira/browse/SOLR-10229
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: SOLR-10229.patch
>
>
> The test schema files are intimidating. There are about a zillion of them, 
> and making a change in any of them risks breaking some _other_ test. That 
> leaves people three choices:
> 1> add what they need to some existing schema. Which makes schemas bigger and 
> bigger and bigger.
> 2> create a new schema file, adding to the proliferation thereof.
> 3> Look through all the existing tests to see if they have something that 
> works.
> The recent work on LUCENE-7705 is a case in point. We're adding a maxLen 
> parameter to some tokenizers. Putting those parameters into any of the 
> existing schemas, especially to test < 255 char tokens is virtually 
> guaranteed to break other tests, so the only safe thing to do is make another 
> schema file. Adding to the multiplication of files.
> As part of SOLR-5260 I tried creating the schema on the fly rather than 
> creating a new static schema file and it's not hard. WDYT about making this 
> into some better thought-out utility? 
> At present, this is pretty fuzzy, I wanted to get some reactions before 
> putting much effort into it. I expect that the utility methods would 
> eventually get a bunch of canned types. It's reasonably straightforward for 
> primitive types, if lengthy. But when you get into solr.TextField-based types 
> it gets less straight-forward.
> We could manage to just move the "intimidation" from the plethora of schema 
> files to a zillion fieldTypes in the utility to choose from...
> Also, forcing every test to define the fields up-front is arguably less 
> convenient than just having _some_ canned schemas we can use. And erroneous 
> schemas to test failure modes are probably not very good fits for any such 
> framework.
> [~steve_rowe] and [~hossman_luc...@fucit.org] in particular might have 
> something to say.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-04-01 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952283#comment-15952283
 ] 

Joel Bernstein edited comment on SOLR-10351 at 4/1/17 3:46 PM:
---

bq. Wouldn't the NLP processing as advertised in the title of this issue be 
most likely to put it's processing into analysis attributes? This stream 
evaluator only emits the character data attribute.

Possibly. I definitely have much to learn about the analysis chain. In the 
first pass I was mostly interested in getting the token stream from the 
analysis chain. What I had envisioned in the future was having analysis chains 
that perform sentence chunking, entity extraction, noun phrase extraction 
etc... I was seeing these as a finished token streams. But exposing the 
analysis attributes would seem to make sense in the future.

bq. BTW Please use try-finally (even try-with-resources style) to close 
token-streams wherever possible. Analyzer internal parts are internally shared 
in thread-locals and the ramifications can be nasty on the entire Solr node if 
at any time one filter has a bug or something on a particular value. Your Solr 
node then becomes poisoned in a sense and only a restart will fix the ailment.

Will do.


was (Author: joel.bernstein):
bq. Wouldn't the NLP processing as advertised in the title of this issue be 
most likely to put it's processing into analysis attributes? This stream 
evaluator only emits the character data attribute.

Possibly. I definitely have much to learn about the analysis chain. In the 
first pass I was mostly interested in getting the token stream from the 
analysis chain. What I had envisioned in the future was having token streams 
that perform sentence chunking, entity extraction, noun phrase extraction 
etc... I was seeing these as a finished token streams. But exposing the 
analysis attributes would seem to make sense in the future.

bq. BTW Please use try-finally (even try-with-resources style) to close 
token-streams wherever possible. Analyzer internal parts are internally shared 
in thread-locals and the ramifications can be nasty on the entire Solr node if 
at any time one filter has a bug or something on a particular value. Your Solr 
node then becomes poisoned in a sense and only a restart will fix the ailment.

Will do.

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-04-01 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952283#comment-15952283
 ] 

Joel Bernstein commented on SOLR-10351:
---

bq. Wouldn't the NLP processing as advertised in the title of this issue be 
most likely to put it's processing into analysis attributes? This stream 
evaluator only emits the character data attribute.

Possibly. I definitely have much to learn about the analysis chain. In the 
first pass I was mostly interested in getting the token stream from the 
analysis chain. What I had envisioned in the future was having token streams 
that perform sentence chunking, entity extraction, noun phrase extraction 
etc... I was seeing these as a finished token streams. But exposing the 
analysis attributes would seem to make sense in the future.

bq. BTW Please use try-finally (even try-with-resources style) to close 
token-streams wherever possible. Analyzer internal parts are internally shared 
in thread-locals and the ramifications can be nasty on the entire Solr node if 
at any time one filter has a bug or something on a particular value. Your Solr 
node then becomes poisoned in a sense and only a restart will fix the ailment.

Will do.

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-04-01 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952245#comment-15952245
 ] 

David Smiley commented on SOLR-10351:
-

Wouldn't the NLP processing as advertised in the title of this issue be most 
likely to put it's processing into analysis _attributes_?  This stream 
evaluator only emits the character data attribute.

BTW Please use try-finally (even try-with-resources style) to close 
token-streams wherever possible.  Analyzer internal parts are internally shared 
in thread-locals and the ramifications can be nasty on the entire Solr node if 
at any time one filter has a bug or something on a particular value.  Your Solr 
node then becomes poisoned in a sense and only a restart will fix the ailment.

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3935 - Still Unstable!

2017-04-01 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3935/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC

8 tests failed.
FAILED:  org.apache.solr.cloud.CustomCollectionTest.testRouteFieldForHashRouter

Error Message:
Collection not found: routeFieldColl

Stack Trace:
org.apache.solr.common.SolrException: Collection not found: routeFieldColl
at 
__randomizedtesting.SeedInfo.seed([6DD5429E60B0C9D4:C5E3DC43FFD1228E]:0)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getCollectionNames(CloudSolrClient.java:1382)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1075)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1054)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
at 
org.apache.solr.client.solrj.request.UpdateRequest.commit(UpdateRequest.java:233)
at 
org.apache.solr.cloud.CustomCollectionTest.testRouteFieldForHashRouter(CustomCollectionTest.java:166)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at