[jira] [Commented] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

2019-11-12 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972916#comment-16972916
 ] 

Joel Bernstein commented on SOLR-13864:
---

[~hossman], I'll fix the others as well. 

> MathExpressionTest non-reproducible failures due to assertions of 
> non-absolutes and randomization beyond test seed
> --
>
> Key: SOLR-13864
> URL: https://issues.apache.org/jira/browse/SOLR-13864
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: apache_Lucene-Solr-BadApples-Tests-master_531.log.txt
>
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various 
> jenkins boxes going back quite a while ... mostly from testGammaDistribution, 
> but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out 
> of taking samples from different random distributions that aren't garunteed 
> to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and 
> expecting one to always be strictly greater then the other. I'm not a stats 
> guy, but my naive understanding is that on the low end some of these shapes 
> may cross over, so every possible random sample from one shape is not 
> garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol 
> of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> 
> Tests should not be making assertions about random data that aren't 100% 
> garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + 
> (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < 
> (4.9D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for 
> being controlled by the client/tests ... either via a request param used to 
> initialize any new Random instances, or for example the use of the 
> "tests.seed" property in various places in the code to try and provide some 
> reproducibility even when the external solr client isn't even aware of 
> randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

2019-11-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972767#comment-16972767
 ] 

ASF subversion and git services commented on SOLR-13864:


Commit b872863da9e06a9f21562830cbd26d24d30a8138 in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b872863 ]

SOLR-13864: SolrTestCaseJ4.getNextAvailablePort() has been deprecated

(cherry picked from commit 603be023feaf3f8e3e739e532b488068710d9097)


> MathExpressionTest non-reproducible failures due to assertions of 
> non-absolutes and randomization beyond test seed
> --
>
> Key: SOLR-13864
> URL: https://issues.apache.org/jira/browse/SOLR-13864
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: apache_Lucene-Solr-BadApples-Tests-master_531.log.txt
>
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various 
> jenkins boxes going back quite a while ... mostly from testGammaDistribution, 
> but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out 
> of taking samples from different random distributions that aren't garunteed 
> to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and 
> expecting one to always be strictly greater then the other. I'm not a stats 
> guy, but my naive understanding is that on the low end some of these shapes 
> may cross over, so every possible random sample from one shape is not 
> garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol 
> of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> 
> Tests should not be making assertions about random data that aren't 100% 
> garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + 
> (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < 
> (4.9D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for 
> being controlled by the client/tests ... either via a request param used to 
> initialize any new Random instances, or for example the use of the 
> "tests.seed" property in various places in the code to try and provide some 
> reproducibility even when the external solr client isn't even aware of 
> randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

2019-11-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972758#comment-16972758
 ] 

ASF subversion and git services commented on SOLR-13864:


Commit 603be023feaf3f8e3e739e532b488068710d9097 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=603be02 ]

SOLR-13864: SolrTestCaseJ4.getNextAvailablePort() has been deprecated


> MathExpressionTest non-reproducible failures due to assertions of 
> non-absolutes and randomization beyond test seed
> --
>
> Key: SOLR-13864
> URL: https://issues.apache.org/jira/browse/SOLR-13864
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: apache_Lucene-Solr-BadApples-Tests-master_531.log.txt
>
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various 
> jenkins boxes going back quite a while ... mostly from testGammaDistribution, 
> but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out 
> of taking samples from different random distributions that aren't garunteed 
> to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and 
> expecting one to always be strictly greater then the other. I'm not a stats 
> guy, but my naive understanding is that on the low end some of these shapes 
> may cross over, so every possible random sample from one shape is not 
> garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol 
> of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> 
> Tests should not be making assertions about random data that aren't 100% 
> garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + 
> (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < 
> (4.9D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for 
> being controlled by the client/tests ... either via a request param used to 
> initialize any new Random instances, or for example the use of the 
> "tests.seed" property in various places in the code to try and provide some 
> reproducibility even when the external solr client isn't even aware of 
> randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

2019-11-05 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967833#comment-16967833
 ] 

Joel Bernstein commented on SOLR-13864:
---

I pushed a fix that made the test non-probabilistic and also tests more 
functions than just sampling. This test should pass every time, unless there is 
something that I'm not understanding about how the underlying commons math api 
works.

> MathExpressionTest non-reproducible failures due to assertions of 
> non-absolutes and randomization beyond test seed
> --
>
> Key: SOLR-13864
> URL: https://issues.apache.org/jira/browse/SOLR-13864
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Joel Bernstein
>Priority: Major
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various 
> jenkins boxes going back quite a while ... mostly from testGammaDistribution, 
> but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out 
> of taking samples from different random distributions that aren't garunteed 
> to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and 
> expecting one to always be strictly greater then the other. I'm not a stats 
> guy, but my naive understanding is that on the low end some of these shapes 
> may cross over, so every possible random sample from one shape is not 
> garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol 
> of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> 
> Tests should not be making assertions about random data that aren't 100% 
> garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + 
> (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < 
> (4.9D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for 
> being controlled by the client/tests ... either via a request param used to 
> initialize any new Random instances, or for example the use of the 
> "tests.seed" property in various places in the code to try and provide some 
> reproducibility even when the external solr client isn't even aware of 
> randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

2019-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967831#comment-16967831
 ] 

ASF subversion and git services commented on SOLR-13864:


Commit 4f849e7a496e9ce0065d65390642af9949f60a65 in lucene-solr's branch 
refs/heads/master from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4f849e7 ]

SOLR-13864: MathExpressionTest non-reproducible failures due to assertions of 
non-absolutes and randomization beyond test seed


> MathExpressionTest non-reproducible failures due to assertions of 
> non-absolutes and randomization beyond test seed
> --
>
> Key: SOLR-13864
> URL: https://issues.apache.org/jira/browse/SOLR-13864
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Joel Bernstein
>Priority: Major
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various 
> jenkins boxes going back quite a while ... mostly from testGammaDistribution, 
> but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out 
> of taking samples from different random distributions that aren't garunteed 
> to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and 
> expecting one to always be strictly greater then the other. I'm not a stats 
> guy, but my naive understanding is that on the low end some of these shapes 
> may cross over, so every possible random sample from one shape is not 
> garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol 
> of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> 
> Tests should not be making assertions about random data that aren't 100% 
> garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + 
> (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < 
> (4.9D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for 
> being controlled by the client/tests ... either via a request param used to 
> initialize any new Random instances, or for example the use of the 
> "tests.seed" property in various places in the code to try and provide some 
> reproducibility even when the external solr client isn't even aware of 
> randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

2019-11-05 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967569#comment-16967569
 ] 

Joel Bernstein commented on SOLR-13864:
---

I'll fix this. I went overboard on these tests. No need to test that random 
behavior is as expected. Just need to test the function works and it brings 
back results that look right. We're not testing the underlying math is right, 
those tests are in Apache Commons math, we're just testing the integration is 
working.

 

 

> MathExpressionTest non-reproducible failures due to assertions of 
> non-absolutes and randomization beyond test seed
> --
>
> Key: SOLR-13864
> URL: https://issues.apache.org/jira/browse/SOLR-13864
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Joel Bernstein
>Priority: Major
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various 
> jenkins boxes going back quite a while ... mostly from testGammaDistribution, 
> but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out 
> of taking samples from different random distributions that aren't garunteed 
> to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and 
> expecting one to always be strictly greater then the other. I'm not a stats 
> guy, but my naive understanding is that on the low end some of these shapes 
> may cross over, so every possible random sample from one shape is not 
> garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol 
> of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> 
> Tests should not be making assertions about random data that aren't 100% 
> garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + 
> (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < 
> (4.9D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for 
> being controlled by the client/tests ... either via a request param used to 
> initialize any new Random instances, or for example the use of the 
> "tests.seed" property in various places in the code to try and provide some 
> reproducibility even when the external solr client isn't even aware of 
> randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13864) MathExpressionTest non-reproducible failures due to assertions of non-absolutes and randomization beyond test seed

2019-10-23 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958132#comment-16958132
 ] 

Chris M. Hostetter commented on SOLR-13864:
---

Hey [~jbernste] - any thoughts on these two issues? 

Fixing the reproducibility aspects of the code seems like it may be hard given 
how much of the match code is based on randomization (including randomization 
built into external libraries) but can you take a look at some of these 
failures and see if there are ways to harden the assertions themselves to be 
more absolute regardless of the random data?

> MathExpressionTest non-reproducible failures due to assertions of 
> non-absolutes and randomization beyond test seed
> --
>
> Key: SOLR-13864
> URL: https://issues.apache.org/jira/browse/SOLR-13864
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Joel Bernstein
>Priority: Major
>
> We're seeing a a fairly steady trickle of MathExpressionTest from various 
> jenkins boxes going back quite a while ... mostly from testGammaDistribution, 
> but other tests pop up now and then.
> the crux of the problem with this test seems to break down into 2 categories:
>  # tests that make assumptions about the relative values that will come out 
> of taking samples from different random distributions that aren't garunteed 
> to be true
>  ** ie: comparing 2 random samples from 2 diff shaped gamma distributions and 
> expecting one to always be strictly greater then the other. I'm not a stats 
> guy, but my naive understanding is that on the low end some of these shapes 
> may cross over, so every possible random sample from one shape is not 
> garunteed to be less then every ossible random sample from a diff shape
>  # the code being tested does it's own randomization outside of the crontrol 
> of the test framework (or test client)
>  ** this causes the seeds to not reproduce
> 
> Tests should not be making assertions about random data that aren't 100% 
> garunteed to be true in all cases (ie: {{random().nextInt(5) < (5.0D + 
> (double) random().nextInt(5))}} is one thing, {{random().nextInt(5) < 
> (4.9D + (double) random().nextInt(5))}} is a diff story.
> Randomized behavior in solr (non-test) code should ideally have some way for 
> being controlled by the client/tests ... either via a request param used to 
> initialize any new Random instances, or for example the use of the 
> "tests.seed" property in various places in the code to try and provide some 
> reproducibility even when the external solr client isn't even aware of 
> randomization being a factor in the behavior of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org