[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

erikerlandson Sun, 14 Sep 2014 08:35:26 -0700

Github user erikerlandson commented on the pull request:

    https://github.com/apache/spark/pull/2313#issuecomment-55529273
  
    @mattf, one useful question would be:  do the results generate equivalent 
output distributions.   The basic methodology would be to collect output in 
both scenarios, and run Kolmogorov-Smirnov tests to assess whether the sampling 
is statistically equivalent.
    
    I did this recently for testing my upcoming proposal for gap sampling:
    https://gist.github.com/erikerlandson/05db1f15c8d623448ff6
    
    That doesn't cover the question of *exactly* reproducible results.  I'm not 
sure if that would be feasible or not.  In general, I only consider *exactly* 
reproducible results as being relevant for things like unit testing 
applications, so if that's important my answer would be "make sure your 
environment is set up to either use numpy or not, consistently"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

Reply via email to