RE: [GSoC][STATISTICS][Regression] Architecture ImplementationSuggestions

2019-05-16 Thread Ben Nguyen
Hello,

This post has been split into four JIRA sub-tasks of STATISTICS-8 as suggested:
https://issues.apache.org/jira/browse/STATISTICS-8

Hopefully anyone who has input will comment under each sub-task thread.
The UML diagrams are also visible on the description and attachments inside 
STATISTICS-11

Cheers,
-Ben Nguyen

From: Gilles Sadowski
Sent: Thursday, May 16, 2019 8:27 AM
To: Commons Developers List
Subject: Re: [GSoC][STATISTICS][Regression] Architecture 
ImplementationSuggestions

Hello.

Le jeu. 16 mai 2019 à 10:02, Ben Nguyen  a écrit :
>
> Hello,
>
>
>
> I have some broad general ideas about how the regression module should be 
> structured, as outlined in my proposal briefly with UMLs
>
> This is the current implementation inside commons-math-stat-regression:

It seems there is/was an image here but I don't see it.

For this kind of information, please use JIRA (and provide the link here).

>
>
> This is my propsed idea, where the structure was partly inspired by SuanShu 
> since it supported multiple types of regression (including logistic):
>
> https://github.com/aaiyer/SuanShu/tree/master/src/main/java/com/numericalmethod/suanshu/stats/regression/linear
>
>
>
> Disclaimer: I have only studied some econometrics and second year computer 
> science in university, so I have zero professional data engineering 
> experience, but am excited to start learning with this project. So, I don’t 
> currently know the exact needs of data engineers in regards to this module 
> and am learning as I go….which is why I would very much appreciate any input 
> on the kinds of requirements data engineers would want from this regression 
> module.

Basing a design on use-cases is very useful.
You should collect a range of them (small/large datasets, in-memory/stream,
dense/sparse) in order to figure what parts of the code can be common and
what requires specialization.

>
> From someone who has used the current implementation or will use this new 
> implementation:
>
> What would make your life easier?
> What should definitely be kept?
> What should be added/improved?
> Any specific features or design criterions?
> Any changes or radically different approaches to the following idea?

Good questions!
What are your answers? ;-)

> Note: OLS, GLS and Logistic regression are the first to be implemented, with 
> focus to make architectural support for further additions. Changes will make 
> use of new Java 8 features, specifically the Java Streams API to improve 
> performance and readability.
>

+1
I'd suggest to select one and start coding, without fearing that you'll
probably have to change a lot of it as more use-cases are collected.

>
>
> Updates to this proposed implementation UML in my proposal:
>
> “statistics-regression-reqLinearMath” will be replaced with EJML as suggested 
> by Mr. Eric Barnhill
>
> This will include a custom matrix class extended from EJML’s SimpleBase -> 
> StatisticsMatrix
> So if we decide to use an Apache Commons implementation of matrices later on, 
> only this class should be changed internally.

Good precaution; but I doubt that we can include everything in a
single class.
How to best encapsulate the linear algebra (external) library is a
subject on its own, worth its own thread:  Cramming many questions
in a single post makes it likely that some will be missed by some
people who might later on question the chosen path.  [External
dependencies is a sensitive issue, in Commons...]

Also, I remind that we need to take into account the comparative
benchmarks which I posted recently.  [Even if just to conclude that
EJML has overwhelming advantages (which?) that make it more
suitable than its "competitors".]

>
> Abstract classes should have interfaces above them or perhaps just be 
> interfaces if a simpler approach is implemented (ie minimal OOP)
>
> Notes about this proposed implementation:
>
> AbstractVariables and it’s child classes may not be necessary, ie just 
> Estimators and Residuals classes
> Or perhaps it’s best to follow the current implementation’s example and have 
> a single class per regression type for hierarchy simplicity (but risking 
> redundancies)?
> I have not looked into specific data members or individual methods yet. So 
> far just taking notes from the current implementation and SuanShu
> The “statistics-regression-updating” components have quite complex algorithms 
> which will require a lot of time for me to understand completely
>
> So for now, I see myself making minimal changes to them, prioritizing the new 
> “stored” components.

IMHO, this will better discussed once an initial implementation is shown
(or perhaps, as Eric suggested, with unit tests).

Again, better to start a new thread for each specific question, possibly backed
with a new JIRA report focussed on a particular task (see "Create sub-tasks"
on JIRA).

>
> RegressionDataLoader’s purpose is to:
>
> provide a clean input interface
> and to ensure that data from say double[ ][ ] is only 

Re: [Lang] BigDecimalStatistics proposition

2019-05-16 Thread Gilles Sadowski
Hi.

Le jeu. 16 mai 2019 à 22:45, Aleksander Ściborek
 a écrit :
>
> Should I create a new Maven commons-statistics submodule for this?

[If the current idea is put the functionality in "Commons Statistics", you
should change this thread's "Subject:" line.]

Then, yes, there should be a new module.

> Besides
> the BigDecimalStatistics I'm going to create support for downstream
> operators for BigDecimals and maybe BigIntegers.

Is the goal to "only" mimic the JDK's "IntSummaryStatistics", or do you
have a specific use-case?
In the latter case, it will be worth considering how all the functionality in
Commons Math's "o.a.c.math4.stat.descriptive" package[1] will be
supported.

Regards,
Gilles

[1] 
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/main/java/org/apache/commons/math4/stat/descriptive



> On Wed, 15 May 2019 at 03:36, Eric Barnhill  wrote:
>
> > Yes. This sounds great for commons-statistics. Other work in a similar vein
> > will be happening this summer by one of our GSOC mentees.
> >
> > On Tue, May 14, 2019, 15:04 Gary Gregory  wrote:
> >
> > > We have a Commons Statistics component that might be a fit.
> > >
> > > Gary
> > >
> > > On Tue, May 14, 2019, 17:34 Aleksander Ściborek <
> > > aleksanderscibo...@gmail.com> wrote:
> > >
> > > > Hi, I've come up with the idea of making easier using Stream with
> > > > BigDecimal class.
> > > > The idea is to create BigDecimalStatistics class which provide a
> > > convenient
> > > > way for calculating max, min, average and sum from BigDecimals from
> > > Stream.
> > > > I think that it's very suitable for commons library.
> > > > Should it be implemented in commons lang or commons math? I believe
> > that
> > > > it's more suitable for commons lang
> > > > This is a link to Jira Ticket : LANG-1459
> > > > 
> > > > Aleksander
> > > >
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Lang] BigDecimalStatistics proposition

2019-05-16 Thread Aleksander Ściborek
Should I create a new Maven commons-statistics submodule for this? Besides
the BigDecimalStatistics I'm going to create support for downstream
operators for BigDecimals and maybe BigIntegers.

On Wed, 15 May 2019 at 03:36, Eric Barnhill  wrote:

> Yes. This sounds great for commons-statistics. Other work in a similar vein
> will be happening this summer by one of our GSOC mentees.
>
> On Tue, May 14, 2019, 15:04 Gary Gregory  wrote:
>
> > We have a Commons Statistics component that might be a fit.
> >
> > Gary
> >
> > On Tue, May 14, 2019, 17:34 Aleksander Ściborek <
> > aleksanderscibo...@gmail.com> wrote:
> >
> > > Hi, I've come up with the idea of making easier using Stream with
> > > BigDecimal class.
> > > The idea is to create BigDecimalStatistics class which provide a
> > convenient
> > > way for calculating max, min, average and sum from BigDecimals from
> > Stream.
> > > I think that it's very suitable for commons library.
> > > Should it be implemented in commons lang or commons math? I believe
> that
> > > it's more suitable for commons lang
> > > This is a link to Jira Ticket : LANG-1459
> > > 
> > > Aleksander
> > >
> >
>


Re: [io] NIO2 and non-default file system support

2019-05-16 Thread Gary Gregory
Hi Mark,

Mining JIRA or GitHub PRs is always a good idea. Aside from that, I'd say
just work on something that interests you.

Gary

On Thu, May 16, 2019 at 3:53 PM Chesney, Mark 
wrote:

> Thank you both. I'd be willing to make some more contributions towards
> modernization of the API if you have some idea where the greatest need or
> highest value is. Are there any existing JIRA issues that come to mind?
>
> Regards,
> Mark
>
> > -Original Message-
> > From: Matt Sicker 
> > Sent: Thursday, May 16, 2019 8:22 AM
> > To: Commons Developers List 
> > Subject: Re: [io] NIO2 and non-default file system support
> >
> > Thanks, Gary!
> >
> > On Thu, 16 May 2019 at 05:37, Gary Gregory 
> wrote:
> > >
> > > The one patch has been merged.
> > >
> > > Gary
> > >
> > > On Thu, May 16, 2019, 02:03 Matt Sicker  wrote:
> > >
> > > > I’ve been interested in seeing IO be updated to include support for
> > > > using Path instead of just File. I can review your PR this week,
> > > > though I’m certainly not the only one who can.
> > > >
> > > > On Tue, May 14, 2019 at 13:19, Chesney, Mark
> > > > 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Awhile back I ran into a situation where I needed to read the
> > > > > lines of a file that might be on a non-default file system, like
> > > > > an in-memory file system, on Java 7+. I looked to the commons-io
> > > > > ReversedLinesFileReader,
> > > > but
> > > > > it only works with java.io.File files which are always on the
> > > > > default
> > > > file
> > > > > system only. I duplicated the class in my project and found it was
> > > > > relatively straightforward to adapt it to support both
> > > > > java.io.File and java.nio.file.Path file. Commons-io 2.6 seems to
> > > > > be the first version to require Java 7 which introduced NIO2. I
> > > > > think others would appreciate the
> > > > > NIO2 constructors, saving a call to Path#toFile(), even if they're
> > > > > not using non-default file systems. I previously created a JIRA
> > > > > issue IO-578< https://issues.apache.org/jira/browse/IO-578> and an
> > > > > GitHub pull request
> > > > > #62. I feel the PR
> > > > > is of very high quality, short and to the point, ready or nearly
> > > > > ready to
> > > > merge.
> > > > > I would appreciate any feedback I can get on the JIRA issue or
> GitHub PR.
> > > > > I'm hopeful this could make commons-io 2.7. Thanks for your
> > > > > attention and consideration.
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Mark
> > > > >
> > > > --
> > > > Matt Sicker 
> > > >
> >
> >
> >
> > --
> > Matt Sicker 
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [io] NIO2 and non-default file system support

2019-05-16 Thread Matt Sicker
I previously noted my ideas in https://issues.apache.org/jira/browse/IO-595

On Thu, 16 May 2019 at 14:53, Chesney, Mark  wrote:
>
> Thank you both. I'd be willing to make some more contributions towards 
> modernization of the API if you have some idea where the greatest need or 
> highest value is. Are there any existing JIRA issues that come to mind?
>
> Regards,
> Mark
>
> > -Original Message-
> > From: Matt Sicker 
> > Sent: Thursday, May 16, 2019 8:22 AM
> > To: Commons Developers List 
> > Subject: Re: [io] NIO2 and non-default file system support
> >
> > Thanks, Gary!
> >
> > On Thu, 16 May 2019 at 05:37, Gary Gregory  wrote:
> > >
> > > The one patch has been merged.
> > >
> > > Gary
> > >
> > > On Thu, May 16, 2019, 02:03 Matt Sicker  wrote:
> > >
> > > > I’ve been interested in seeing IO be updated to include support for
> > > > using Path instead of just File. I can review your PR this week,
> > > > though I’m certainly not the only one who can.
> > > >
> > > > On Tue, May 14, 2019 at 13:19, Chesney, Mark
> > > > 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Awhile back I ran into a situation where I needed to read the
> > > > > lines of a file that might be on a non-default file system, like
> > > > > an in-memory file system, on Java 7+. I looked to the commons-io
> > > > > ReversedLinesFileReader,
> > > > but
> > > > > it only works with java.io.File files which are always on the
> > > > > default
> > > > file
> > > > > system only. I duplicated the class in my project and found it was
> > > > > relatively straightforward to adapt it to support both
> > > > > java.io.File and java.nio.file.Path file. Commons-io 2.6 seems to
> > > > > be the first version to require Java 7 which introduced NIO2. I
> > > > > think others would appreciate the
> > > > > NIO2 constructors, saving a call to Path#toFile(), even if they're
> > > > > not using non-default file systems. I previously created a JIRA
> > > > > issue IO-578< https://issues.apache.org/jira/browse/IO-578> and an
> > > > > GitHub pull request
> > > > > #62. I feel the PR
> > > > > is of very high quality, short and to the point, ready or nearly
> > > > > ready to
> > > > merge.
> > > > > I would appreciate any feedback I can get on the JIRA issue or GitHub 
> > > > > PR.
> > > > > I'm hopeful this could make commons-io 2.7. Thanks for your
> > > > > attention and consideration.
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Mark
> > > > >
> > > > --
> > > > Matt Sicker 
> > > >
> >
> >
> >
> > --
> > Matt Sicker 
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
>


-- 
Matt Sicker 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



RE: [io] NIO2 and non-default file system support

2019-05-16 Thread Chesney, Mark
Thank you both. I'd be willing to make some more contributions towards 
modernization of the API if you have some idea where the greatest need or 
highest value is. Are there any existing JIRA issues that come to mind?

Regards,
Mark

> -Original Message-
> From: Matt Sicker 
> Sent: Thursday, May 16, 2019 8:22 AM
> To: Commons Developers List 
> Subject: Re: [io] NIO2 and non-default file system support
> 
> Thanks, Gary!
> 
> On Thu, 16 May 2019 at 05:37, Gary Gregory  wrote:
> >
> > The one patch has been merged.
> >
> > Gary
> >
> > On Thu, May 16, 2019, 02:03 Matt Sicker  wrote:
> >
> > > I’ve been interested in seeing IO be updated to include support for
> > > using Path instead of just File. I can review your PR this week,
> > > though I’m certainly not the only one who can.
> > >
> > > On Tue, May 14, 2019 at 13:19, Chesney, Mark
> > > 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > Awhile back I ran into a situation where I needed to read the
> > > > lines of a file that might be on a non-default file system, like
> > > > an in-memory file system, on Java 7+. I looked to the commons-io
> > > > ReversedLinesFileReader,
> > > but
> > > > it only works with java.io.File files which are always on the
> > > > default
> > > file
> > > > system only. I duplicated the class in my project and found it was
> > > > relatively straightforward to adapt it to support both
> > > > java.io.File and java.nio.file.Path file. Commons-io 2.6 seems to
> > > > be the first version to require Java 7 which introduced NIO2. I
> > > > think others would appreciate the
> > > > NIO2 constructors, saving a call to Path#toFile(), even if they're
> > > > not using non-default file systems. I previously created a JIRA
> > > > issue IO-578< https://issues.apache.org/jira/browse/IO-578> and an
> > > > GitHub pull request
> > > > #62. I feel the PR
> > > > is of very high quality, short and to the point, ready or nearly
> > > > ready to
> > > merge.
> > > > I would appreciate any feedback I can get on the JIRA issue or GitHub 
> > > > PR.
> > > > I'm hopeful this could make commons-io 2.7. Thanks for your
> > > > attention and consideration.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Mark
> > > >
> > > --
> > > Matt Sicker 
> > >
> 
> 
> 
> --
> Matt Sicker 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org



RE: [IO] Update to Java 8

2019-05-16 Thread Chesney, Mark
I think it's a good idea. A recent update of the parent POM version seems to 
have broken the Travis build for Java 7. Discontinuing Java 7 support would 
make that a non-issue.

> -Original Message-
> From: Rob Tompkins 
> Sent: Thursday, May 16, 2019 8:10 AM
> To: Commons Developers List 
> Subject: Re: [IO] Update to Java 8
> 
> 
> On 5/15/2019 2:42 PM, Gary Gregory wrote:
> > Hi all,
> >
> > Time to update to Java 8 methinks.
> 
> +1
> 
> >
> > Gary
> >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org



Re: [io] NIO2 and non-default file system support

2019-05-16 Thread Matt Sicker
Thanks, Gary!

On Thu, 16 May 2019 at 05:37, Gary Gregory  wrote:
>
> The one patch has been merged.
>
> Gary
>
> On Thu, May 16, 2019, 02:03 Matt Sicker  wrote:
>
> > I’ve been interested in seeing IO be updated to include support for using
> > Path instead of just File. I can review your PR this week, though I’m
> > certainly not the only one who can.
> >
> > On Tue, May 14, 2019 at 13:19, Chesney, Mark 
> > wrote:
> >
> > > Hello,
> > >
> > > Awhile back I ran into a situation where I needed to read the lines of a
> > > file that might be on a non-default file system, like an in-memory file
> > > system, on Java 7+. I looked to the commons-io ReversedLinesFileReader,
> > but
> > > it only works with java.io.File files which are always on the default
> > file
> > > system only. I duplicated the class in my project and found it was
> > > relatively straightforward to adapt it to support both java.io.File and
> > > java.nio.file.Path file. Commons-io 2.6 seems to be the first version to
> > > require Java 7 which introduced NIO2. I think others would appreciate the
> > > NIO2 constructors, saving a call to Path#toFile(), even if they're not
> > > using non-default file systems. I previously created a JIRA issue IO-578<
> > > https://issues.apache.org/jira/browse/IO-578> and an GitHub pull request
> > > #62. I feel the PR is of
> > > very high quality, short and to the point, ready or nearly ready to
> > merge.
> > > I would appreciate any feedback I can get on the JIRA issue or GitHub PR.
> > > I'm hopeful this could make commons-io 2.7. Thanks for your attention and
> > > consideration.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Mark
> > >
> > --
> > Matt Sicker 
> >



-- 
Matt Sicker 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [IO] Update to Java 8

2019-05-16 Thread Rob Tompkins



On 5/15/2019 2:42 PM, Gary Gregory wrote:

Hi all,

Time to update to Java 8 methinks.


+1



Gary



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] stress test results

2019-05-16 Thread Alex Herbert


> On 16 May 2019, at 15:33, Gilles Sadowski  wrote:
> 
> Hi.
> 
> Le jeu. 16 mai 2019 à 16:04, Alex Herbert  > a écrit :
>> 
>> 
>> 
>>> On 16 May 2019, at 14:42, Gilles Sadowski >> > wrote:
>>> 
>>> Hello.
>>> 
>>> Le jeu. 16 mai 2019 à 12:06, Alex Herbert >>  >> >> a écrit :
 
 I have run the stress test using the new application. The new application 
 has two major changes over the previous application:
 
 1. It detects the platform byte-order and sends the bits in the correct 
 order to be read by a C application
 2. The bridge to TestU01 has been updated to use all the input int values, 
 previously it was using every other int value
 
 So we can expect differences from both test suites Dieharder and TestU01 
 BigCrush.
 
 For reference here are the old results (from the user guide, reordered to 
 the RandomSource enum order):
 
 RNG Dieharder   TestU01 (BigCrush)
 JDK 11, 12, 13  74, 72, 75
 WELL_512_A  0, 0, 0 7, 6, 6
 WELL_1024_A 0, 0, 0 4, 4, 5
 WELL_19937_A0, 0, 0 3, 2, 3
 WELL_19937_C0, 1, 0 2, 2, 3
 WELL_44497_A0, 0, 0 2, 3, 3
 WELL_44497_B0, 0, 0 2, 2, 2
 MT  0, 1, 0 3, 2, 2
 ISAAC   0, 0, 1 0, 1, 0
 SPLIT_MIX_640, 0, 0 2, 0, 0
 XOR_SHIFT_1024_S0, 0, 0 2, 0, 0
 TWO_CMRES   1, 1, 1 0, 0, 1
 MT_64   0, 0, 1 3, 2, 3
 MWC_256 0, 0, 0 0, 0, 0
 KISS0, 0, 0 1, 2, 0
 
 Here are the new results:
 
 RNG Dieharder   TestU01 (BigCrush)
 JDK 4,4,4,4,4   74,72,74,73,74
 WELL_512_A  0,0,0,0,0   7,6,6,6,6
 WELL_1024_A 0,0,0,0,0   4,4,5,4,4
 WELL_19937_A0,1,0,0,1   3,3,2,2,2
 WELL_19937_C0,0,0,0,0   2,2,3,2,2
 WELL_44497_A0,0,0,0,0   2,2,2,2,3
 WELL_44497_B0,0,0,0,0   2,3,2,2,2
 MT  0,0,0,0,0   2,3,2,2,2
 ISAAC   0,0,0,0,0   0,1,2,0,0
 SPLIT_MIX_640,0,0,0,0   1,0,0,0,0
 XOR_SHIFT_1024_S0,0,0,0,0   0,0,0,0,0
 TWO_CMRES   2,2,2,2,2   4,3,3,5,4
 MT_64   0,0,0,0,0   2,3,2,2,2
 MWC_256 0,1,0,0,0   0,0,0,2,0
 KISS0,0,0,0,0   0,0,0,0,0
 XOR_SHIFT_1024_S_PHI0,0,0,0,0   0,0,0,0,0
 XO_RO_SHI_RO_64_S   0,0,0,0,0   1,1,2,1,3
 XO_RO_SHI_RO_64_SS  0,0,0,0,0   0,0,0,0,0
 XO_SHI_RO_128_PLUS  0,0,1,0,0   1,2,2,1,1
 XO_SHI_RO_128_SS0,0,0,1,0   0,1,0,0,0
 XO_RO_SHI_RO_128_PLUS   0,0,0,0,0   0,1,0,0,0
 XO_RO_SHI_RO_128_SS 0,0,0,0,0   1,0,1,0,0
 XO_SHI_RO_256_PLUS  0,1,0,0,0   0,0,0,0,0
 XO_SHI_RO_256_SS0,0,0,0,0   0,1,0,2,1
 XO_SHI_RO_512_PLUS  0,0,0,0,1   0,0,0,2,2
 XO_SHI_RO_512_SS0,0,0,0,0   0,1,0,1,0
 
 (Note: All of the single fails except one under Dieharder are for the 
 flawed diehard_sums test. I include it here for direct comparison with old 
 results. I would recommend we strip this from the new results for the user 
 guide.)
 
 I ran them 3 times. Then because the results were different (mainly for 
 the JDK generator for Dieharder) I doubled checked everything and ran 
 another 2. Results are still the same. Dieharder is much better for the 
 JDK than previously. It systematically fails:
 
 diehard_opso:0
 diehard_oqso:0
 diehard_dna:0
 dab_bytedistrib:0
 
 The TWO_CMRES generator is now worse as it is systematically failing:
 
 diehard_oqso:0
 diehard_dna:0
 
 The results from BigCrush are similar for JDK and all the others except 
 TWO_CMRES. This is now failing a few more tests. It systematically fails:
 
 1  SerialOver, r = 0
 41  Permutation, t = 5
 42  Permutation, t = 7
 
 To check the JDK results for Dieharder I ran it 5 times using the wrong 
 platform byte order (i.e. what the previous test application was doing).
 
 Old results : 11, 12, 13
 New results: 11,16,14,14,15
 
 So this matches up. If the JDK output is byte reversed it is a poor 
 generator.
 
 A few sources I have read indicate that BigCrush favours the upper bits of 
 a generator. A test should therefore run the generator bit reversed 
 through the test 

Re: [rng] stress test results

2019-05-16 Thread Gilles Sadowski
Hi.

Le jeu. 16 mai 2019 à 16:04, Alex Herbert  a écrit :
>
>
>
> > On 16 May 2019, at 14:42, Gilles Sadowski  wrote:
> >
> > Hello.
> >
> > Le jeu. 16 mai 2019 à 12:06, Alex Herbert  > > a écrit :
> >>
> >> I have run the stress test using the new application. The new application 
> >> has two major changes over the previous application:
> >>
> >> 1. It detects the platform byte-order and sends the bits in the correct 
> >> order to be read by a C application
> >> 2. The bridge to TestU01 has been updated to use all the input int values, 
> >> previously it was using every other int value
> >>
> >> So we can expect differences from both test suites Dieharder and TestU01 
> >> BigCrush.
> >>
> >> For reference here are the old results (from the user guide, reordered to 
> >> the RandomSource enum order):
> >>
> >> RNG Dieharder   TestU01 (BigCrush)
> >> JDK 11, 12, 13  74, 72, 75
> >> WELL_512_A  0, 0, 0 7, 6, 6
> >> WELL_1024_A 0, 0, 0 4, 4, 5
> >> WELL_19937_A0, 0, 0 3, 2, 3
> >> WELL_19937_C0, 1, 0 2, 2, 3
> >> WELL_44497_A0, 0, 0 2, 3, 3
> >> WELL_44497_B0, 0, 0 2, 2, 2
> >> MT  0, 1, 0 3, 2, 2
> >> ISAAC   0, 0, 1 0, 1, 0
> >> SPLIT_MIX_640, 0, 0 2, 0, 0
> >> XOR_SHIFT_1024_S0, 0, 0 2, 0, 0
> >> TWO_CMRES   1, 1, 1 0, 0, 1
> >> MT_64   0, 0, 1 3, 2, 3
> >> MWC_256 0, 0, 0 0, 0, 0
> >> KISS0, 0, 0 1, 2, 0
> >>
> >> Here are the new results:
> >>
> >> RNG Dieharder   TestU01 (BigCrush)
> >> JDK 4,4,4,4,4   74,72,74,73,74
> >> WELL_512_A  0,0,0,0,0   7,6,6,6,6
> >> WELL_1024_A 0,0,0,0,0   4,4,5,4,4
> >> WELL_19937_A0,1,0,0,1   3,3,2,2,2
> >> WELL_19937_C0,0,0,0,0   2,2,3,2,2
> >> WELL_44497_A0,0,0,0,0   2,2,2,2,3
> >> WELL_44497_B0,0,0,0,0   2,3,2,2,2
> >> MT  0,0,0,0,0   2,3,2,2,2
> >> ISAAC   0,0,0,0,0   0,1,2,0,0
> >> SPLIT_MIX_640,0,0,0,0   1,0,0,0,0
> >> XOR_SHIFT_1024_S0,0,0,0,0   0,0,0,0,0
> >> TWO_CMRES   2,2,2,2,2   4,3,3,5,4
> >> MT_64   0,0,0,0,0   2,3,2,2,2
> >> MWC_256 0,1,0,0,0   0,0,0,2,0
> >> KISS0,0,0,0,0   0,0,0,0,0
> >> XOR_SHIFT_1024_S_PHI0,0,0,0,0   0,0,0,0,0
> >> XO_RO_SHI_RO_64_S   0,0,0,0,0   1,1,2,1,3
> >> XO_RO_SHI_RO_64_SS  0,0,0,0,0   0,0,0,0,0
> >> XO_SHI_RO_128_PLUS  0,0,1,0,0   1,2,2,1,1
> >> XO_SHI_RO_128_SS0,0,0,1,0   0,1,0,0,0
> >> XO_RO_SHI_RO_128_PLUS   0,0,0,0,0   0,1,0,0,0
> >> XO_RO_SHI_RO_128_SS 0,0,0,0,0   1,0,1,0,0
> >> XO_SHI_RO_256_PLUS  0,1,0,0,0   0,0,0,0,0
> >> XO_SHI_RO_256_SS0,0,0,0,0   0,1,0,2,1
> >> XO_SHI_RO_512_PLUS  0,0,0,0,1   0,0,0,2,2
> >> XO_SHI_RO_512_SS0,0,0,0,0   0,1,0,1,0
> >>
> >> (Note: All of the single fails except one under Dieharder are for the 
> >> flawed diehard_sums test. I include it here for direct comparison with old 
> >> results. I would recommend we strip this from the new results for the user 
> >> guide.)
> >>
> >> I ran them 3 times. Then because the results were different (mainly for 
> >> the JDK generator for Dieharder) I doubled checked everything and ran 
> >> another 2. Results are still the same. Dieharder is much better for the 
> >> JDK than previously. It systematically fails:
> >>
> >> diehard_opso:0
> >> diehard_oqso:0
> >> diehard_dna:0
> >> dab_bytedistrib:0
> >>
> >> The TWO_CMRES generator is now worse as it is systematically failing:
> >>
> >> diehard_oqso:0
> >> diehard_dna:0
> >>
> >> The results from BigCrush are similar for JDK and all the others except 
> >> TWO_CMRES. This is now failing a few more tests. It systematically fails:
> >>
> >> 1  SerialOver, r = 0
> >> 41  Permutation, t = 5
> >> 42  Permutation, t = 7
> >>
> >> To check the JDK results for Dieharder I ran it 5 times using the wrong 
> >> platform byte order (i.e. what the previous test application was doing).
> >>
> >> Old results : 11, 12, 13
> >> New results: 11,16,14,14,15
> >>
> >> So this matches up. If the JDK output is byte reversed it is a poor 
> >> generator.
> >>
> >> A few sources I have read indicate that BigCrush favours the upper bits of 
> >> a generator. A test should therefore run the generator bit reversed 
> >> through the test application. Here are the full forward and backward 
> >> results ignoring the Diehard sums test:
> >>
> >> RNG Bit-reversedDieharder   TestU01 (BigCrush)
> >> JDK false   

Re: [rng] stress test results

2019-05-16 Thread Alex Herbert


> On 16 May 2019, at 14:42, Gilles Sadowski  wrote:
> 
> Hello.
> 
> Le jeu. 16 mai 2019 à 12:06, Alex Herbert  > a écrit :
>> 
>> I have run the stress test using the new application. The new application 
>> has two major changes over the previous application:
>> 
>> 1. It detects the platform byte-order and sends the bits in the correct 
>> order to be read by a C application
>> 2. The bridge to TestU01 has been updated to use all the input int values, 
>> previously it was using every other int value
>> 
>> So we can expect differences from both test suites Dieharder and TestU01 
>> BigCrush.
>> 
>> For reference here are the old results (from the user guide, reordered to 
>> the RandomSource enum order):
>> 
>> RNG Dieharder   TestU01 (BigCrush)
>> JDK 11, 12, 13  74, 72, 75
>> WELL_512_A  0, 0, 0 7, 6, 6
>> WELL_1024_A 0, 0, 0 4, 4, 5
>> WELL_19937_A0, 0, 0 3, 2, 3
>> WELL_19937_C0, 1, 0 2, 2, 3
>> WELL_44497_A0, 0, 0 2, 3, 3
>> WELL_44497_B0, 0, 0 2, 2, 2
>> MT  0, 1, 0 3, 2, 2
>> ISAAC   0, 0, 1 0, 1, 0
>> SPLIT_MIX_640, 0, 0 2, 0, 0
>> XOR_SHIFT_1024_S0, 0, 0 2, 0, 0
>> TWO_CMRES   1, 1, 1 0, 0, 1
>> MT_64   0, 0, 1 3, 2, 3
>> MWC_256 0, 0, 0 0, 0, 0
>> KISS0, 0, 0 1, 2, 0
>> 
>> Here are the new results:
>> 
>> RNG Dieharder   TestU01 (BigCrush)
>> JDK 4,4,4,4,4   74,72,74,73,74
>> WELL_512_A  0,0,0,0,0   7,6,6,6,6
>> WELL_1024_A 0,0,0,0,0   4,4,5,4,4
>> WELL_19937_A0,1,0,0,1   3,3,2,2,2
>> WELL_19937_C0,0,0,0,0   2,2,3,2,2
>> WELL_44497_A0,0,0,0,0   2,2,2,2,3
>> WELL_44497_B0,0,0,0,0   2,3,2,2,2
>> MT  0,0,0,0,0   2,3,2,2,2
>> ISAAC   0,0,0,0,0   0,1,2,0,0
>> SPLIT_MIX_640,0,0,0,0   1,0,0,0,0
>> XOR_SHIFT_1024_S0,0,0,0,0   0,0,0,0,0
>> TWO_CMRES   2,2,2,2,2   4,3,3,5,4
>> MT_64   0,0,0,0,0   2,3,2,2,2
>> MWC_256 0,1,0,0,0   0,0,0,2,0
>> KISS0,0,0,0,0   0,0,0,0,0
>> XOR_SHIFT_1024_S_PHI0,0,0,0,0   0,0,0,0,0
>> XO_RO_SHI_RO_64_S   0,0,0,0,0   1,1,2,1,3
>> XO_RO_SHI_RO_64_SS  0,0,0,0,0   0,0,0,0,0
>> XO_SHI_RO_128_PLUS  0,0,1,0,0   1,2,2,1,1
>> XO_SHI_RO_128_SS0,0,0,1,0   0,1,0,0,0
>> XO_RO_SHI_RO_128_PLUS   0,0,0,0,0   0,1,0,0,0
>> XO_RO_SHI_RO_128_SS 0,0,0,0,0   1,0,1,0,0
>> XO_SHI_RO_256_PLUS  0,1,0,0,0   0,0,0,0,0
>> XO_SHI_RO_256_SS0,0,0,0,0   0,1,0,2,1
>> XO_SHI_RO_512_PLUS  0,0,0,0,1   0,0,0,2,2
>> XO_SHI_RO_512_SS0,0,0,0,0   0,1,0,1,0
>> 
>> (Note: All of the single fails except one under Dieharder are for the flawed 
>> diehard_sums test. I include it here for direct comparison with old results. 
>> I would recommend we strip this from the new results for the user guide.)
>> 
>> I ran them 3 times. Then because the results were different (mainly for the 
>> JDK generator for Dieharder) I doubled checked everything and ran another 2. 
>> Results are still the same. Dieharder is much better for the JDK than 
>> previously. It systematically fails:
>> 
>> diehard_opso:0
>> diehard_oqso:0
>> diehard_dna:0
>> dab_bytedistrib:0
>> 
>> The TWO_CMRES generator is now worse as it is systematically failing:
>> 
>> diehard_oqso:0
>> diehard_dna:0
>> 
>> The results from BigCrush are similar for JDK and all the others except 
>> TWO_CMRES. This is now failing a few more tests. It systematically fails:
>> 
>> 1  SerialOver, r = 0
>> 41  Permutation, t = 5
>> 42  Permutation, t = 7
>> 
>> To check the JDK results for Dieharder I ran it 5 times using the wrong 
>> platform byte order (i.e. what the previous test application was doing).
>> 
>> Old results : 11, 12, 13
>> New results: 11,16,14,14,15
>> 
>> So this matches up. If the JDK output is byte reversed it is a poor 
>> generator.
>> 
>> A few sources I have read indicate that BigCrush favours the upper bits of a 
>> generator. A test should therefore run the generator bit reversed through 
>> the test application. Here are the full forward and backward results 
>> ignoring the Diehard sums test:
>> 
>> RNG Bit-reversedDieharder   TestU01 (BigCrush)
>> JDK false   4,4,4,4,4   74,72,74,73,74
>> JDK true42,42,43,49,49  35,34,35,36,36
>> WELL_512_A  false   0,0,0,0,0   7,6,6,6,6
>> WELL_512_A  true0,0,1,0,0   7,6,6,7,6
>> WELL_1024_A 

Re: [rng] stress test results

2019-05-16 Thread Gilles Sadowski
Hello.

Le jeu. 16 mai 2019 à 12:06, Alex Herbert  a écrit :
>
> I have run the stress test using the new application. The new application has 
> two major changes over the previous application:
>
> 1. It detects the platform byte-order and sends the bits in the correct order 
> to be read by a C application
> 2. The bridge to TestU01 has been updated to use all the input int values, 
> previously it was using every other int value
>
> So we can expect differences from both test suites Dieharder and TestU01 
> BigCrush.
>
> For reference here are the old results (from the user guide, reordered to the 
> RandomSource enum order):
>
> RNG Dieharder   TestU01 (BigCrush)
> JDK 11, 12, 13  74, 72, 75
> WELL_512_A  0, 0, 0 7, 6, 6
> WELL_1024_A 0, 0, 0 4, 4, 5
> WELL_19937_A0, 0, 0 3, 2, 3
> WELL_19937_C0, 1, 0 2, 2, 3
> WELL_44497_A0, 0, 0 2, 3, 3
> WELL_44497_B0, 0, 0 2, 2, 2
> MT  0, 1, 0 3, 2, 2
> ISAAC   0, 0, 1 0, 1, 0
> SPLIT_MIX_640, 0, 0 2, 0, 0
> XOR_SHIFT_1024_S0, 0, 0 2, 0, 0
> TWO_CMRES   1, 1, 1 0, 0, 1
> MT_64   0, 0, 1 3, 2, 3
> MWC_256 0, 0, 0 0, 0, 0
> KISS0, 0, 0 1, 2, 0
>
> Here are the new results:
>
> RNG Dieharder   TestU01 (BigCrush)
> JDK 4,4,4,4,4   74,72,74,73,74
> WELL_512_A  0,0,0,0,0   7,6,6,6,6
> WELL_1024_A 0,0,0,0,0   4,4,5,4,4
> WELL_19937_A0,1,0,0,1   3,3,2,2,2
> WELL_19937_C0,0,0,0,0   2,2,3,2,2
> WELL_44497_A0,0,0,0,0   2,2,2,2,3
> WELL_44497_B0,0,0,0,0   2,3,2,2,2
> MT  0,0,0,0,0   2,3,2,2,2
> ISAAC   0,0,0,0,0   0,1,2,0,0
> SPLIT_MIX_640,0,0,0,0   1,0,0,0,0
> XOR_SHIFT_1024_S0,0,0,0,0   0,0,0,0,0
> TWO_CMRES   2,2,2,2,2   4,3,3,5,4
> MT_64   0,0,0,0,0   2,3,2,2,2
> MWC_256 0,1,0,0,0   0,0,0,2,0
> KISS0,0,0,0,0   0,0,0,0,0
> XOR_SHIFT_1024_S_PHI0,0,0,0,0   0,0,0,0,0
> XO_RO_SHI_RO_64_S   0,0,0,0,0   1,1,2,1,3
> XO_RO_SHI_RO_64_SS  0,0,0,0,0   0,0,0,0,0
> XO_SHI_RO_128_PLUS  0,0,1,0,0   1,2,2,1,1
> XO_SHI_RO_128_SS0,0,0,1,0   0,1,0,0,0
> XO_RO_SHI_RO_128_PLUS   0,0,0,0,0   0,1,0,0,0
> XO_RO_SHI_RO_128_SS 0,0,0,0,0   1,0,1,0,0
> XO_SHI_RO_256_PLUS  0,1,0,0,0   0,0,0,0,0
> XO_SHI_RO_256_SS0,0,0,0,0   0,1,0,2,1
> XO_SHI_RO_512_PLUS  0,0,0,0,1   0,0,0,2,2
> XO_SHI_RO_512_SS0,0,0,0,0   0,1,0,1,0
>
> (Note: All of the single fails except one under Dieharder are for the flawed 
> diehard_sums test. I include it here for direct comparison with old results. 
> I would recommend we strip this from the new results for the user guide.)
>
> I ran them 3 times. Then because the results were different (mainly for the 
> JDK generator for Dieharder) I doubled checked everything and ran another 2. 
> Results are still the same. Dieharder is much better for the JDK than 
> previously. It systematically fails:
>
> diehard_opso:0
> diehard_oqso:0
> diehard_dna:0
> dab_bytedistrib:0
>
> The TWO_CMRES generator is now worse as it is systematically failing:
>
> diehard_oqso:0
> diehard_dna:0
>
> The results from BigCrush are similar for JDK and all the others except 
> TWO_CMRES. This is now failing a few more tests. It systematically fails:
>
> 1  SerialOver, r = 0
> 41  Permutation, t = 5
> 42  Permutation, t = 7
>
> To check the JDK results for Dieharder I ran it 5 times using the wrong 
> platform byte order (i.e. what the previous test application was doing).
>
> Old results : 11, 12, 13
> New results: 11,16,14,14,15
>
> So this matches up. If the JDK output is byte reversed it is a poor generator.
>
> A few sources I have read indicate that BigCrush favours the upper bits of a 
> generator. A test should therefore run the generator bit reversed through the 
> test application. Here are the full forward and backward results ignoring the 
> Diehard sums test:
>
> RNG Bit-reversedDieharder   TestU01 (BigCrush)
> JDK false   4,4,4,4,4   74,72,74,73,74
> JDK true42,42,43,49,49  35,34,35,36,36
> WELL_512_A  false   0,0,0,0,0   7,6,6,6,6
> WELL_512_A  true0,0,1,0,0   7,6,6,7,6
> WELL_1024_A false   0,0,0,0,0   4,4,5,4,4
> WELL_1024_A true0,0,0,0,0   4,4,4,4,4
> WELL_19937_Afalse   0,1,0,0,0   3,3,2,2,2
> WELL_19937_Atrue0,0,0,0,0 

Re: [GSoC] commons-gsoc Thursday meeting?

2019-05-16 Thread Gilles Sadowski
Hi Eric.

I won't be able to attend (but I've already provided comments on the ML).

Best,
Gilles

Le mar. 14 mai 2019 à 18:57, Rob Tompkins  a écrit :
>
>
> On 5/14/2019 12:47 PM, Eric Barnhill wrote:
> > Should we have another Slack meeting at the same time this Thursday, 5pm
> > UTC (9am California time)?
>
>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [GSoC][STATISTICS][Regression] Architecture Implementation Suggestions

2019-05-16 Thread Gilles Sadowski
Hello.

Le jeu. 16 mai 2019 à 10:02, Ben Nguyen  a écrit :
>
> Hello,
>
>
>
> I have some broad general ideas about how the regression module should be 
> structured, as outlined in my proposal briefly with UMLs
>
> This is the current implementation inside commons-math-stat-regression:

It seems there is/was an image here but I don't see it.

For this kind of information, please use JIRA (and provide the link here).

>
>
> This is my propsed idea, where the structure was partly inspired by SuanShu 
> since it supported multiple types of regression (including logistic):
>
> https://github.com/aaiyer/SuanShu/tree/master/src/main/java/com/numericalmethod/suanshu/stats/regression/linear
>
>
>
> Disclaimer: I have only studied some econometrics and second year computer 
> science in university, so I have zero professional data engineering 
> experience, but am excited to start learning with this project. So, I don’t 
> currently know the exact needs of data engineers in regards to this module 
> and am learning as I go….which is why I would very much appreciate any input 
> on the kinds of requirements data engineers would want from this regression 
> module.

Basing a design on use-cases is very useful.
You should collect a range of them (small/large datasets, in-memory/stream,
dense/sparse) in order to figure what parts of the code can be common and
what requires specialization.

>
> From someone who has used the current implementation or will use this new 
> implementation:
>
> What would make your life easier?
> What should definitely be kept?
> What should be added/improved?
> Any specific features or design criterions?
> Any changes or radically different approaches to the following idea?

Good questions!
What are your answers? ;-)

> Note: OLS, GLS and Logistic regression are the first to be implemented, with 
> focus to make architectural support for further additions. Changes will make 
> use of new Java 8 features, specifically the Java Streams API to improve 
> performance and readability.
>

+1
I'd suggest to select one and start coding, without fearing that you'll
probably have to change a lot of it as more use-cases are collected.

>
>
> Updates to this proposed implementation UML in my proposal:
>
> “statistics-regression-reqLinearMath” will be replaced with EJML as suggested 
> by Mr. Eric Barnhill
>
> This will include a custom matrix class extended from EJML’s SimpleBase -> 
> StatisticsMatrix
> So if we decide to use an Apache Commons implementation of matrices later on, 
> only this class should be changed internally.

Good precaution; but I doubt that we can include everything in a
single class.
How to best encapsulate the linear algebra (external) library is a
subject on its own, worth its own thread:  Cramming many questions
in a single post makes it likely that some will be missed by some
people who might later on question the chosen path.  [External
dependencies is a sensitive issue, in Commons...]

Also, I remind that we need to take into account the comparative
benchmarks which I posted recently.  [Even if just to conclude that
EJML has overwhelming advantages (which?) that make it more
suitable than its "competitors".]

>
> Abstract classes should have interfaces above them or perhaps just be 
> interfaces if a simpler approach is implemented (ie minimal OOP)
>
> Notes about this proposed implementation:
>
> AbstractVariables and it’s child classes may not be necessary, ie just 
> Estimators and Residuals classes
> Or perhaps it’s best to follow the current implementation’s example and have 
> a single class per regression type for hierarchy simplicity (but risking 
> redundancies)?
> I have not looked into specific data members or individual methods yet. So 
> far just taking notes from the current implementation and SuanShu
> The “statistics-regression-updating” components have quite complex algorithms 
> which will require a lot of time for me to understand completely
>
> So for now, I see myself making minimal changes to them, prioritizing the new 
> “stored” components.

IMHO, this will better discussed once an initial implementation is shown
(or perhaps, as Eric suggested, with unit tests).

Again, better to start a new thread for each specific question, possibly backed
with a new JIRA report focussed on a particular task (see "Create sub-tasks"
on JIRA).

>
> RegressionDataLoader’s purpose is to:
>
> provide a clean input interface
> and to ensure that data from say double[ ][ ] is only converted to working 
> form as a StatisticsMatrix object once

Until proven wrong, I'm a proponent of separating I/O from "useful"
computations.
I.e. I suggest that we consider on the one hand what API is required for all the
intented functionalitites, and on the other (in a *different* "maven
module"), all the
conversions that may be implemented for the convenience of users.

> while allowing multiple types of regression to be calculated via a universal 
> form….
> which could become a 

Re: [io] NIO2 and non-default file system support

2019-05-16 Thread Gary Gregory
The one patch has been merged.

Gary

On Thu, May 16, 2019, 02:03 Matt Sicker  wrote:

> I’ve been interested in seeing IO be updated to include support for using
> Path instead of just File. I can review your PR this week, though I’m
> certainly not the only one who can.
>
> On Tue, May 14, 2019 at 13:19, Chesney, Mark 
> wrote:
>
> > Hello,
> >
> > Awhile back I ran into a situation where I needed to read the lines of a
> > file that might be on a non-default file system, like an in-memory file
> > system, on Java 7+. I looked to the commons-io ReversedLinesFileReader,
> but
> > it only works with java.io.File files which are always on the default
> file
> > system only. I duplicated the class in my project and found it was
> > relatively straightforward to adapt it to support both java.io.File and
> > java.nio.file.Path file. Commons-io 2.6 seems to be the first version to
> > require Java 7 which introduced NIO2. I think others would appreciate the
> > NIO2 constructors, saving a call to Path#toFile(), even if they're not
> > using non-default file systems. I previously created a JIRA issue IO-578<
> > https://issues.apache.org/jira/browse/IO-578> and an GitHub pull request
> > #62. I feel the PR is of
> > very high quality, short and to the point, ready or nearly ready to
> merge.
> > I would appreciate any feedback I can get on the JIRA issue or GitHub PR.
> > I'm hopeful this could make commons-io 2.7. Thanks for your attention and
> > consideration.
> >
> >
> >
> > Regards,
> >
> > Mark
> >
> --
> Matt Sicker 
>


[rng] stress test results

2019-05-16 Thread Alex Herbert
I have run the stress test using the new application. The new application has 
two major changes over the previous application:

1. It detects the platform byte-order and sends the bits in the correct order 
to be read by a C application
2. The bridge to TestU01 has been updated to use all the input int values, 
previously it was using every other int value

So we can expect differences from both test suites Dieharder and TestU01 
BigCrush.

For reference here are the old results (from the user guide, reordered to the 
RandomSource enum order):

RNG Dieharder   TestU01 (BigCrush)
JDK 11, 12, 13  74, 72, 75
WELL_512_A  0, 0, 0 7, 6, 6
WELL_1024_A 0, 0, 0 4, 4, 5
WELL_19937_A0, 0, 0 3, 2, 3
WELL_19937_C0, 1, 0 2, 2, 3
WELL_44497_A0, 0, 0 2, 3, 3
WELL_44497_B0, 0, 0 2, 2, 2
MT  0, 1, 0 3, 2, 2
ISAAC   0, 0, 1 0, 1, 0
SPLIT_MIX_640, 0, 0 2, 0, 0
XOR_SHIFT_1024_S0, 0, 0 2, 0, 0
TWO_CMRES   1, 1, 1 0, 0, 1
MT_64   0, 0, 1 3, 2, 3
MWC_256 0, 0, 0 0, 0, 0
KISS0, 0, 0 1, 2, 0

Here are the new results:

RNG Dieharder   TestU01 (BigCrush)
JDK 4,4,4,4,4   74,72,74,73,74
WELL_512_A  0,0,0,0,0   7,6,6,6,6 
WELL_1024_A 0,0,0,0,0   4,4,5,4,4 
WELL_19937_A0,1,0,0,1   3,3,2,2,2 
WELL_19937_C0,0,0,0,0   2,2,3,2,2 
WELL_44497_A0,0,0,0,0   2,2,2,2,3 
WELL_44497_B0,0,0,0,0   2,3,2,2,2 
MT  0,0,0,0,0   2,3,2,2,2 
ISAAC   0,0,0,0,0   0,1,2,0,0 
SPLIT_MIX_640,0,0,0,0   1,0,0,0,0 
XOR_SHIFT_1024_S0,0,0,0,0   0,0,0,0,0 
TWO_CMRES   2,2,2,2,2   4,3,3,5,4 
MT_64   0,0,0,0,0   2,3,2,2,2 
MWC_256 0,1,0,0,0   0,0,0,2,0 
KISS0,0,0,0,0   0,0,0,0,0 
XOR_SHIFT_1024_S_PHI0,0,0,0,0   0,0,0,0,0 
XO_RO_SHI_RO_64_S   0,0,0,0,0   1,1,2,1,3 
XO_RO_SHI_RO_64_SS  0,0,0,0,0   0,0,0,0,0 
XO_SHI_RO_128_PLUS  0,0,1,0,0   1,2,2,1,1 
XO_SHI_RO_128_SS0,0,0,1,0   0,1,0,0,0 
XO_RO_SHI_RO_128_PLUS   0,0,0,0,0   0,1,0,0,0 
XO_RO_SHI_RO_128_SS 0,0,0,0,0   1,0,1,0,0 
XO_SHI_RO_256_PLUS  0,1,0,0,0   0,0,0,0,0 
XO_SHI_RO_256_SS0,0,0,0,0   0,1,0,2,1 
XO_SHI_RO_512_PLUS  0,0,0,0,1   0,0,0,2,2 
XO_SHI_RO_512_SS0,0,0,0,0   0,1,0,1,0

(Note: All of the single fails except one under Dieharder are for the flawed 
diehard_sums test. I include it here for direct comparison with old results. I 
would recommend we strip this from the new results for the user guide.)

I ran them 3 times. Then because the results were different (mainly for the JDK 
generator for Dieharder) I doubled checked everything and ran another 2. 
Results are still the same. Dieharder is much better for the JDK than 
previously. It systematically fails:

diehard_opso:0
diehard_oqso:0
diehard_dna:0
dab_bytedistrib:0

The TWO_CMRES generator is now worse as it is systematically failing:

diehard_oqso:0
diehard_dna:0

The results from BigCrush are similar for JDK and all the others except 
TWO_CMRES. This is now failing a few more tests. It systematically fails:

1  SerialOver, r = 0
41  Permutation, t = 5
42  Permutation, t = 7

To check the JDK results for Dieharder I ran it 5 times using the wrong 
platform byte order (i.e. what the previous test application was doing).

Old results : 11, 12, 13
New results: 11,16,14,14,15

So this matches up. If the JDK output is byte reversed it is a poor generator.

A few sources I have read indicate that BigCrush favours the upper bits of a 
generator. A test should therefore run the generator bit reversed through the 
test application. Here are the full forward and backward results ignoring the 
Diehard sums test:

RNG Bit-reversedDieharder   TestU01 (BigCrush)
JDK false   4,4,4,4,4   74,72,74,73,74
JDK true42,42,43,49,49  35,34,35,36,36
WELL_512_A  false   0,0,0,0,0   7,6,6,6,6 
WELL_512_A  true0,0,1,0,0   7,6,6,7,6 
WELL_1024_A false   0,0,0,0,0   4,4,5,4,4 
WELL_1024_A true0,0,0,0,0   4,4,4,4,4 
WELL_19937_Afalse   0,1,0,0,0   3,3,2,2,2 
WELL_19937_Atrue

Re: [rng] Information Pertaining to Running Benchmarks

2019-05-16 Thread Alex Herbert


> On 16 May 2019, at 06:45, Abhishek Dhadwal  wrote:
> 
> On Thu, May 9, 2019 at 4:26 AM Alex Herbert 
> wrote:
> 
> 
>> You have to run Maven from the appropriate sub-directory. Your screenshot
>> on Slack shows you are in the top level commons-rng directory:
>> 
>> $ cd commons-rng-examples/examples-jmh
>> $ mvn javadoc:javadoc
>> 
>> That should work.
>> 
>> It did! Thank you !
> 
>> The JMH tutorial requires you to download JMH from its own source
>> repository. They use Mercurial (an alternative source control system from
>> git). This is the ‘hg’ command (Mercury is element Hg) in the list of
>> instructions I sent.
>> 
>> Try running through them step-by-step. It should work as these are the
>> commands that I used to get it running.
>> 
>> Turned out I was building in a wrong directory.
> I changed the directories and tried building the JMH project in order to go
> through the tutorials using mvn clean install. I got a build failure. The
> error report is as follows :
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.0-M1:jar (attach-javadoc)
> on project jmh-samples: MavenReportException: Error while generating
> Javadoc:
> [ERROR] Exit code: 1 - javadoc: error - The code being documented uses
> modules but the packages defined in
> http://docs.oracle.com/javase/7/docs/api/ are in the unnamed module.
> [ERROR]
> [ERROR] Command line was: "C:\Program
> Files\Java\jdk-11.0.2\bin\javadoc.exe" @options @packages
> [ERROR]
> [ERROR] Refer to the generated Javadoc files in
> 'C:\Users\abhi1\Documents\GitHub\commons-rng\jmh\jmh-samples\target\apidocs'
> dir.

This is due to stricter Java 11 implementation of Javadoc with regard to 
modules. It is the same error that prevented you from building the entire RNG 
project using Java 11. The fix (for Java 7) requires adding the 
7 tag into the javadoc plugin of the pom. This is not worth it 
for just building JMH. I would recommend you install JDK 8 and make it your 
main java version. It would be rare that you need JDK 11 to build most 
software. You do not need it for anything you will do on the commons RNG 
project.

So for now install JDK 8 and try and run through the JMH tutorials.


Note that although it is possible for commons RNG to make it work under Java 11 
to build the javadoc this then prevents a full build of the site for deployment 
as one of the plugins does not handle java 11 source. So commons RNG has to be 
built with java 9. This is an issue that may have to be resolved at the time of 
the next release. This problem with the strictness of javadoc in Java 11 is 
scheduled for fix in 11.0.3. All my attempts to make this work are documented 
here:

https://github.com/apache/commons-rng/pull/32 



> 
>> Alex
>> 
> Regards,
> Abhishek



[GSoC][STATISTICS][Regression] Architecture Implementation Suggestions

2019-05-16 Thread Ben Nguyen
Hello,

I have some broad general ideas about how the regression module should be 
structured, as outlined in my proposal briefly with UMLs
This is the current implementation inside commons-math-stat-regression:




This is my propsed idea, where the structure was partly inspired by SuanShu 
since it supported multiple types of regression (including logistic):
https://github.com/aaiyer/SuanShu/tree/master/src/main/java/com/numericalmethod/suanshu/stats/regression/linear

Disclaimer: I have only studied some econometrics and second year computer 
science in university, so I have zero professional data engineering experience, 
but am excited to start learning with this project. So, I don’t currently know 
the exact needs of data engineers in regards to this module and am learning as 
I go….which is why I would very much appreciate any input on the kinds of 
requirements data engineers would want from this regression module. 

>From someone who has used the current implementation or will use this new 
>implementation:
- What would make your life easier? 
- What should definitely be kept? 
- What should be added/improved?
- Any specific features or design criterions? 
- Any changes or radically different approaches to the following idea?
Note: OLS, GLS and Logistic regression are the first to be implemented, with 
focus to make architectural support for further additions. Changes will make 
use of new Java 8 features, specifically the Java Streams API to improve 
performance and readability.



Updates to this proposed implementation UML in my proposal:
- “statistics-regression-reqLinearMath” will be replaced with EJML as suggested 
by Mr. Eric Barnhill
o This will include a custom matrix class extended from EJML’s SimpleBase -> 
StatisticsMatrix
o So if we decide to use an Apache Commons implementation of matrices later on, 
only this class should be changed internally.
- Abstract classes should have interfaces above them or perhaps just be 
interfaces if a simpler approach is implemented (ie minimal OOP)
Notes about this proposed implementation:
- AbstractVariables and it’s child classes may not be necessary, ie just 
Estimators and Residuals classes
- Or perhaps it’s best to follow the current implementation’s example and have 
a single class per regression type for hierarchy simplicity (but risking 
redundancies)?
- I have not looked into specific data members or individual methods yet. So 
far just taking notes from the current implementation and SuanShu
- The “statistics-regression-updating” components have quite complex algorithms 
which will require a lot of time for me to understand completely
o So for now, I see myself making minimal changes to them, prioritizing the new 
“stored” components.
- RegressionDataLoader’s purpose is to: 
o provide a clean input interface 
o and to ensure that data from say double[ ][ ] is only converted to working 
form as a StatisticsMatrix object once 
• while allowing multiple types of regression to be calculated via a universal 
form…. 
• which could become a challenge once details are in order.

So this is the current state of my plan, with your input, I will move to the 
next steps, plan more details and start creating the software flowchart.

Thank you in advance for any advice/suggestions,
-Ben Nguyen

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [io] NIO2 and non-default file system support

2019-05-16 Thread Matt Sicker
I’ve been interested in seeing IO be updated to include support for using
Path instead of just File. I can review your PR this week, though I’m
certainly not the only one who can.

On Tue, May 14, 2019 at 13:19, Chesney, Mark 
wrote:

> Hello,
>
> Awhile back I ran into a situation where I needed to read the lines of a
> file that might be on a non-default file system, like an in-memory file
> system, on Java 7+. I looked to the commons-io ReversedLinesFileReader, but
> it only works with java.io.File files which are always on the default file
> system only. I duplicated the class in my project and found it was
> relatively straightforward to adapt it to support both java.io.File and
> java.nio.file.Path file. Commons-io 2.6 seems to be the first version to
> require Java 7 which introduced NIO2. I think others would appreciate the
> NIO2 constructors, saving a call to Path#toFile(), even if they're not
> using non-default file systems. I previously created a JIRA issue IO-578<
> https://issues.apache.org/jira/browse/IO-578> and an GitHub pull request
> #62. I feel the PR is of
> very high quality, short and to the point, ready or nearly ready to merge.
> I would appreciate any feedback I can get on the JIRA issue or GitHub PR.
> I'm hopeful this could make commons-io 2.7. Thanks for your attention and
> consideration.
>
>
>
> Regards,
>
> Mark
>
-- 
Matt Sicker