Re: [Statistics] Port codes from Commons Math

2018-05-20 Thread Gimhana Nadeeshan
Hi all,

Porting
*Interval Module *

JIRA SUB-TASK : Interval Module Porting


DESCRIPTION   : Porting and Redesign Module Architecture from Classes at
Commons.Math4.stat.Interval

Basically I have Created a Core Module for containing all the common
classes(solving dependency issues). Separate Module for specialized classes.

STRUCTURE  :

commons-statistics-core
 --- Localizable.java
 --- LocalizedFormats.java

commons-statistics-interval
 --- AgrestiCoullInterval.java
 --- BinomialConfidenceInterval.java
 --- ClopperPearsonInterval.java
 --- ConfidenceInterval.java
 --- IntervalUtils.java
 --- NormalApproximationInterval.java
 --- WilsonScoreInterval.java

All the Exceptions are Replaced using IllegalArgumentException.

Reviews and open for Discussion.

Best Regards,
Gimhana.

On 18 May 2018 at 03:02, Gilles  wrote:

> Hi Gimhana.
>
> On Fri, 18 May 2018 00:16:04 +0530, Gimhana Nadeeshan wrote:
>
>> Hi all,
>>
>> We might want to create a public branch for that work in order to
>>
>>> merge PRs more quickly without risk of breaking "master".
>>> What do you think?  Eric?
>>>
>>>
>> I ported the Statistics Interval Module and would like to get your
>> reviews.
>> How should I make the Pull request ?
>>
>
> I've just created a new branch on the repository; please make
> all PR refer to "task_STATISTICS-5".
> I also suggest that you create finer-grained "sub-tasks" of
>   https://issues.apache.org/jira/browse/STATISTICS-5
>
> Thanks,
> Gilles
>
>
>
>> Best Regards,
>> Gimhana
>>
>>
>> On 5 May 2018 at 18:50, Gilles  wrote:
>>
>> Hi Gimhana.
>>>
>>> On Sat, 5 May 2018 15:50:43 +0530, Gimhana Nadeeshan wrote:
>>>
>>> Hello all,

 As I proposed early I would like to begin port code from Commons-math
  to Commons-statistics
 .
 (For further details refer my  GSoC Proposal

 
 though I'm not selected this year)

 This is my proposed architecture in brief

1. Commons-Statistics-Core => Frequency and StatUtils classes (Can
 add
more common classes while implementing)
2. Commons-Statistics-Correlation
3. Commons-Statistics-Descriptive
4. Commons-Statistics-Inference
5. Commons-Statistics-Interval
6. Commons-Statistics-Ranking
7. Commons-Statistics-Regression


>>> Nit-pick: module names have no capital in them (just a convention).
>>> So: "commons-statistics-core" rather than "Commons-Statistics-Core", etc.
>>>
>>> While I referring Commons-Geometry
>>>


>>> No need to refer to that project since "Commons Statistics" has been
>>> set up:
>>>   http://commons.apache.org/proper/commons-statistics/
>>>
>>> The code repository is here:
>>>   https://git1-us-west.apache.org/repos/asf?p=commons-statisti
>>> cs.git;a=tree
>>> It already contains a "commons-statistics-distribution" module whose
>>> layout can be duplicated in the modules which you are proposing above
>>> (with appropriate changes of course).
>>>
>>> ported code to get a head start , I
>>>
 found that each module inside, contain a pox.xml file. Are they
 implemented
 as separate projects and then group in the same package? I'm asking
 because
 Since I'm new to code porting :-).


>>> A requirement is that no package should be shared between different
>>> modules; by convention, the top-level package of module
>>>   commons-statistics-descriptive
>>> would be
>>>   org.apache.commons.statistics.descriptive
>>>
>>> [And so on for the other modules. But I'd suggest you start with one.]
>>>
>>> If so in here should I create all 7 projects and then group those in same
>>>
 project.


>>> No, the project is "Commons Statisitics" and it would contain several
>>> _maven_ modules, each of which should ultimately map to a _JPMS_ (JDK9)
>>> module).
>>>
>>> Firstly I suppose to start port Ranking Module as it has less
>>>
 dependencies comparing to others.


>>> Fine. But don't forget to browse through the JIRA issues of Commons
>>> Math (CM) for things that would need fixing.  Whenever it's the case,
>>> please open a report in the new JIRA project (linking to the CM
>>> report), and post here your proposed solution (or questions).
>>>
>>> We might want to create a public branch for that work in order to
>>> merge PRs more quickly without risk of breaking "master".
>>> What do you think?  Eric?
>>>
>>> Would someone help me to get a head start ??
>>>


>>> What else do you 

Re: [Statistics] Port codes from Commons Math

2018-05-17 Thread Gilles

Hi Gimhana.

On Fri, 18 May 2018 00:16:04 +0530, Gimhana Nadeeshan wrote:

Hi all,

We might want to create a public branch for that work in order to

merge PRs more quickly without risk of breaking "master".
What do you think?  Eric?



I ported the Statistics Interval Module and would like to get your 
reviews.

How should I make the Pull request ?


I've just created a new branch on the repository; please make
all PR refer to "task_STATISTICS-5".
I also suggest that you create finer-grained "sub-tasks" of
  https://issues.apache.org/jira/browse/STATISTICS-5

Thanks,
Gilles



Best Regards,
Gimhana


On 5 May 2018 at 18:50, Gilles  wrote:


Hi Gimhana.

On Sat, 5 May 2018 15:50:43 +0530, Gimhana Nadeeshan wrote:


Hello all,

As I proposed early I would like to begin port code from 
Commons-math

 to Commons-statistics
.
(For further details refer my  GSoC Proposal


though I'm not selected this year)

This is my proposed architecture in brief

   1. Commons-Statistics-Core => Frequency and StatUtils classes 
(Can add

   more common classes while implementing)
   2. Commons-Statistics-Correlation
   3. Commons-Statistics-Descriptive
   4. Commons-Statistics-Inference
   5. Commons-Statistics-Interval
   6. Commons-Statistics-Ranking
   7. Commons-Statistics-Regression



Nit-pick: module names have no capital in them (just a convention).
So: "commons-statistics-core" rather than "Commons-Statistics-Core", 
etc.


While I referring Commons-Geometry




No need to refer to that project since "Commons Statistics" has been
set up:
  http://commons.apache.org/proper/commons-statistics/

The code repository is here:
  https://git1-us-west.apache.org/repos/asf?p=commons-statisti
cs.git;a=tree
It already contains a "commons-statistics-distribution" module whose
layout can be duplicated in the modules which you are proposing 
above

(with appropriate changes of course).

ported code to get a head start , I

found that each module inside, contain a pox.xml file. Are they
implemented
as separate projects and then group in the same package? I'm asking
because
Since I'm new to code porting :-).



A requirement is that no package should be shared between different
modules; by convention, the top-level package of module
  commons-statistics-descriptive
would be
  org.apache.commons.statistics.descriptive

[And so on for the other modules. But I'd suggest you start with 
one.]


If so in here should I create all 7 projects and then group those in 
same

project.



No, the project is "Commons Statisitics" and it would contain 
several
_maven_ modules, each of which should ultimately map to a _JPMS_ 
(JDK9)

module).

Firstly I suppose to start port Ranking Module as it has less

dependencies comparing to others.



Fine. But don't forget to browse through the JIRA issues of Commons
Math (CM) for things that would need fixing.  Whenever it's the 
case,

please open a report in the new JIRA project (linking to the CM
report), and post here your proposed solution (or questions).

We might want to create a public branch for that work in order to
merge PRs more quickly without risk of breaking "master".
What do you think?  Eric?

Would someone help me to get a head start ??




What else do you need?

Best regards,
Gilles

Best Regards,

Gimhana.


[...]








-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Statistics] Port codes from Commons Math

2018-05-17 Thread Gimhana Nadeeshan
Hi all,

We might want to create a public branch for that work in order to
> merge PRs more quickly without risk of breaking "master".
> What do you think?  Eric?
>

I ported the Statistics Interval Module and would like to get your reviews.
How should I make the Pull request ?

Best Regards,
Gimhana


On 5 May 2018 at 18:50, Gilles  wrote:

> Hi Gimhana.
>
> On Sat, 5 May 2018 15:50:43 +0530, Gimhana Nadeeshan wrote:
>
>> Hello all,
>>
>> As I proposed early I would like to begin port code from Commons-math
>>  to Commons-statistics
>> .
>> (For further details refer my  GSoC Proposal
>>
>> > OBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>
>> though I'm not selected this year)
>>
>> This is my proposed architecture in brief
>>
>>1. Commons-Statistics-Core => Frequency and StatUtils classes (Can add
>>more common classes while implementing)
>>2. Commons-Statistics-Correlation
>>3. Commons-Statistics-Descriptive
>>4. Commons-Statistics-Inference
>>5. Commons-Statistics-Interval
>>6. Commons-Statistics-Ranking
>>7. Commons-Statistics-Regression
>>
>
> Nit-pick: module names have no capital in them (just a convention).
> So: "commons-statistics-core" rather than "Commons-Statistics-Core", etc.
>
> While I referring Commons-Geometry
>>
>
> No need to refer to that project since "Commons Statistics" has been
> set up:
>   http://commons.apache.org/proper/commons-statistics/
>
> The code repository is here:
>   https://git1-us-west.apache.org/repos/asf?p=commons-statisti
> cs.git;a=tree
> It already contains a "commons-statistics-distribution" module whose
> layout can be duplicated in the modules which you are proposing above
> (with appropriate changes of course).
>
> ported code to get a head start , I
>> found that each module inside, contain a pox.xml file. Are they
>> implemented
>> as separate projects and then group in the same package? I'm asking
>> because
>> Since I'm new to code porting :-).
>>
>
> A requirement is that no package should be shared between different
> modules; by convention, the top-level package of module
>   commons-statistics-descriptive
> would be
>   org.apache.commons.statistics.descriptive
>
> [And so on for the other modules. But I'd suggest you start with one.]
>
> If so in here should I create all 7 projects and then group those in same
>> project.
>>
>
> No, the project is "Commons Statisitics" and it would contain several
> _maven_ modules, each of which should ultimately map to a _JPMS_ (JDK9)
> module).
>
> Firstly I suppose to start port Ranking Module as it has less
>> dependencies comparing to others.
>>
>
> Fine. But don't forget to browse through the JIRA issues of Commons
> Math (CM) for things that would need fixing.  Whenever it's the case,
> please open a report in the new JIRA project (linking to the CM
> report), and post here your proposed solution (or questions).
>
> We might want to create a public branch for that work in order to
> merge PRs more quickly without risk of breaking "master".
> What do you think?  Eric?
>
> Would someone help me to get a head start ??
>>
>
> What else do you need?
>
> Best regards,
> Gilles
>
> Best Regards,
>> Gimhana.
>>
>>
>> [...]

>>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [Statistics] Port codes from Commons Math

2018-05-05 Thread Gilles

Hi Gimhana.

On Sat, 5 May 2018 15:50:43 +0530, Gimhana Nadeeshan wrote:

Hello all,

As I proposed early I would like to begin port code from Commons-math
 to Commons-statistics
.
(For further details refer my  GSoC Proposal


though I'm not selected this year)

This is my proposed architecture in brief

   1. Commons-Statistics-Core => Frequency and StatUtils classes (Can 
add

   more common classes while implementing)
   2. Commons-Statistics-Correlation
   3. Commons-Statistics-Descriptive
   4. Commons-Statistics-Inference
   5. Commons-Statistics-Interval
   6. Commons-Statistics-Ranking
   7. Commons-Statistics-Regression


Nit-pick: module names have no capital in them (just a convention).
So: "commons-statistics-core" rather than "Commons-Statistics-Core", 
etc.



While I referring Commons-Geometry


No need to refer to that project since "Commons Statistics" has been
set up:
  http://commons.apache.org/proper/commons-statistics/

The code repository is here:
  
https://git1-us-west.apache.org/repos/asf?p=commons-statistics.git;a=tree

It already contains a "commons-statistics-distribution" module whose
layout can be duplicated in the modules which you are proposing above
(with appropriate changes of course).


ported code to get a head start , I
found that each module inside, contain a pox.xml file. Are they 
implemented
as separate projects and then group in the same package? I'm asking 
because

Since I'm new to code porting :-).


A requirement is that no package should be shared between different
modules; by convention, the top-level package of module
  commons-statistics-descriptive
would be
  org.apache.commons.statistics.descriptive

[And so on for the other modules. But I'd suggest you start with one.]

If so in here should I create all 7 projects and then group those in 
same

project.


No, the project is "Commons Statisitics" and it would contain several
_maven_ modules, each of which should ultimately map to a _JPMS_ (JDK9)
module).


Firstly I suppose to start port Ranking Module as it has less
dependencies comparing to others.


Fine. But don't forget to browse through the JIRA issues of Commons
Math (CM) for things that would need fixing.  Whenever it's the case,
please open a report in the new JIRA project (linking to the CM
report), and post here your proposed solution (or questions).

We might want to create a public branch for that work in order to
merge PRs more quickly without risk of breaking "master".
What do you think?  Eric?


Would someone help me to get a head start ??


What else do you need?

Best regards,
Gilles


Best Regards,
Gimhana.



[...]



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Statistics] Port codes from Commons Math

2018-05-05 Thread Gimhana Nadeeshan
Hello all,

As I proposed early I would like to begin port code from Commons-math
 to Commons-statistics
.
(For further details refer my  GSoC Proposal

though I'm not selected this year)

This is my proposed architecture in brief

   1. Commons-Statistics-Core => Frequency and StatUtils classes (Can add
   more common classes while implementing)
   2. Commons-Statistics-Correlation
   3. Commons-Statistics-Descriptive
   4. Commons-Statistics-Inference
   5. Commons-Statistics-Interval
   6. Commons-Statistics-Ranking
   7. Commons-Statistics-Regression

While I referring Commons-Geometry ported code to get a head start , I
found that each module inside, contain a pox.xml file. Are they implemented
as separate projects and then group in the same package? I'm asking because
Since I'm new to code porting :-).

If so in here should I create all 7 projects and then group those in same
project. Firstly I suppose to start port Ranking Module as it has less
dependencies comparing to others.

Would someone help me to get a head start ??

Best Regards,
Gimhana.


On 14 April 2018 at 14:24, Gimhana Nadeeshan <
gimhanadesilva...@cse.mrt.ac.lk> wrote:

> Hello devs,
>
> *Covariance stats=
>> > IntStream.of(1,2,3).collect(Covariance::new,Covariance::acce
>> pt,Covariance::combine);*
>>
>>
>> Can you explain a bit more what is happening with the method references
>> "accept" and "combine"?
>>
>
> The mutable reduction operation - collect() accumulates input elements
> into a mutable result container, such as a Collection. It requires 3
> functions. A *supplier function* construct new instance of the result
> container. An *accumulator function *incorporate an input element into a
> result container and a *combining function* to merge the contents of one
> result container into another.
>
> So the accept() method, Records a new value into the result container.
> (Here Covariance Object). Accepting the values in the Stream, to the
> Covariance Object. It is the functionality of the functional interface I'm
> going to implement to make use the Lambda Expressions of Java8.
>
> combine()  method will combine the state of another Covariance Object
> into this one. It merges the results of one results container to another.
> Generation of new object is replaced by Replacing.
>
> As a whole the meaning of those implementation is like generating a single
> string object by concatenating strings in an array list. All the
> statistical functionalities are served as a state object in this
> implementation.
>
> *Week 2: Begin porting the code according to the dependency hierarchy
>> > identified. *
>> >
>>
>> Sorry but I cannot see where you identify the dependency hierarchy. Are
>> you
>> referring to your diagram?
>
>
> Dependency Hierarchy is not mentioned separately in the proposal. But I
> have created the Time-line of the proposed project according to that. Less
> dependent modules are porting at the beginning and gradually going for the
> more coupled ones. So at that point of view I am going to port Ranking
> Module at the beginning and gradually port Interval,Regression,
> Descriptive,Correlation,Interference modules and so on.
>
> A further comment: L1-type statistics such as median and quantiles can also
>> be included in the API by using the stream.sorted() method to sort the
>> stream first.
>>
>> While it is true medians can be in the aggregate sped up by partitioning
>> algorithms, I think making use of built-in methods like sorted() is still
>> likely to produce the best and most consistent performance with the JVM.
>
>
> Definitely. Using built-in-methods provided, will make the package
> performance and the ease of use and using inbuilt-methods where is possible
> is one of the main goals of the proposed project.
>
> Best Regards,
> Gimhana.
>
>
> Nadeeshan Gimhana
>
> Batch Representative (15' batch)
>
> Department of Computer Science & Engineering
>
> University of Moratuwa
>
> *Mobile :+94775744613*
>
>
> *Website : https://ngimhana94.wixsite.com/gimhanadesilva/
> *
>
> *L**inkedin **:www.linkedin.com/in/nadeeshangimhana/
> *
>
>
> * *
>
>
> * *
>
>
>
> On 13 April 2018 at 12:26, Eric Barnhill  wrote:
>
>> A further comment: L1-type statistics such as median and quantiles can
>> also
>> be included in the API by using the stream.sorted() method to sort the
>> stream first.
>>
>> While it is true medians can be in the aggregate sped up by partitioning
>> algorithms, I think making use of built-in methods like sorted() is still
>> likely to produce the best and most consistent performance with the JVM.
>>
>> On Thu, Apr 12, 

Re: [Statistics] Port codes from Commons Math

2018-04-14 Thread Gimhana Nadeeshan
Hello devs,

*Covariance stats=
> > IntStream.of(1,2,3).collect(Covariance::new,Covariance::
> accept,Covariance::combine);*
>
>
> Can you explain a bit more what is happening with the method references
> "accept" and "combine"?
>

The mutable reduction operation - collect() accumulates input elements into
a mutable result container, such as a Collection. It requires 3 functions.
A *supplier function* construct new instance of the result container.
An *accumulator
function *incorporate an input element into a result container and a *combining
function* to merge the contents of one result container into another.

So the accept() method, Records a new value into the result container.
(Here Covariance Object). Accepting the values in the Stream, to the
Covariance Object. It is the functionality of the functional interface I'm
going to implement to make use the Lambda Expressions of Java8.

combine()  method will combine the state of another Covariance Object into
this one. It merges the results of one results container to another.
Generation of new object is replaced by Replacing.

As a whole the meaning of those implementation is like generating a single
string object by concatenating strings in an array list. All the
statistical functionalities are served as a state object in this
implementation.

*Week 2: Begin porting the code according to the dependency hierarchy
> > identified. *
> >
>
> Sorry but I cannot see where you identify the dependency hierarchy. Are you
> referring to your diagram?


Dependency Hierarchy is not mentioned separately in the proposal. But I
have created the Time-line of the proposed project according to that. Less
dependent modules are porting at the beginning and gradually going for the
more coupled ones. So at that point of view I am going to port Ranking
Module at the beginning and gradually port
Interval,Regression,Descriptive,Correlation,Interference modules and so on.

A further comment: L1-type statistics such as median and quantiles can also
> be included in the API by using the stream.sorted() method to sort the
> stream first.
>
> While it is true medians can be in the aggregate sped up by partitioning
> algorithms, I think making use of built-in methods like sorted() is still
> likely to produce the best and most consistent performance with the JVM.


Definitely. Using built-in-methods provided, will make the package
performance and the ease of use and using inbuilt-methods where is possible
is one of the main goals of the proposed project.

Best Regards,
Gimhana.


Nadeeshan Gimhana

Batch Representative (15' batch)

Department of Computer Science & Engineering

University of Moratuwa

*Mobile :+94775744613*


*Website : https://ngimhana94.wixsite.com/gimhanadesilva/
*

*L**inkedin **:www.linkedin.com/in/nadeeshangimhana/
*


* *


* *



On 13 April 2018 at 12:26, Eric Barnhill  wrote:

> A further comment: L1-type statistics such as median and quantiles can also
> be included in the API by using the stream.sorted() method to sort the
> stream first.
>
> While it is true medians can be in the aggregate sped up by partitioning
> algorithms, I think making use of built-in methods like sorted() is still
> likely to produce the best and most consistent performance with the JVM.
>
> On Thu, Apr 12, 2018 at 2:03 PM, Eric Barnhill 
> wrote:
>
> > HI Gimhana,
> >
> > Sorry for the delay in response, but you posted this right before our
> > two-week Easter holiday, for which I was completely absent ; then I
> needed
> > a few days back at work to clean up all the mess. :)
> >
> > Your overall goals look good to me. You have gone right to the heart of
> > the matter and propose to reinvent the statistics tools to make good use
> of
> > the Java 8 API. I think that's great and you should get started. Your
> goal
> > of eliminating dependencies on Commons-Math is also right.
> >
> > I noticed this in the proposal:
> >
> > *Covariance stats=
> >> IntStream.of(1,2,3).collect(Covariance::new,Covariance::
> accept,Covariance::combine);*
> >
> >
> > Can you explain a bit more what is happening with the method references
> > "accept" and "combine"?
> >
> > Also this
> >
> > *Week 2: Begin porting the code according to the dependency hierarchy
> >> identified. *
> >>
> >
> > Sorry but I cannot see where you identify the dependency hierarchy. Are
> > you referring to your diagram?
> >
> > Eric
> >
> >
> > On Mon, Mar 26, 2018 at 8:07 AM, Gimhana Nadeeshan <
> > gimhanadesilva...@cse.mrt.ac.lk> wrote:
> >
> >> Hello devs,
> >>
> >> I have updated my draft proposal (Port codes from Commons Math
> >>  >> OBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>)
> >> -Timeline added; before submitting the final at the 

Re: [Statistics] Port codes from Commons Math

2018-04-13 Thread Eric Barnhill
A further comment: L1-type statistics such as median and quantiles can also
be included in the API by using the stream.sorted() method to sort the
stream first.

While it is true medians can be in the aggregate sped up by partitioning
algorithms, I think making use of built-in methods like sorted() is still
likely to produce the best and most consistent performance with the JVM.

On Thu, Apr 12, 2018 at 2:03 PM, Eric Barnhill 
wrote:

> HI Gimhana,
>
> Sorry for the delay in response, but you posted this right before our
> two-week Easter holiday, for which I was completely absent ; then I needed
> a few days back at work to clean up all the mess. :)
>
> Your overall goals look good to me. You have gone right to the heart of
> the matter and propose to reinvent the statistics tools to make good use of
> the Java 8 API. I think that's great and you should get started. Your goal
> of eliminating dependencies on Commons-Math is also right.
>
> I noticed this in the proposal:
>
> *Covariance stats=
>> IntStream.of(1,2,3).collect(Covariance::new,Covariance::accept,Covariance::combine);*
>
>
> Can you explain a bit more what is happening with the method references
> "accept" and "combine"?
>
> Also this
>
> *Week 2: Begin porting the code according to the dependency hierarchy
>> identified. *
>>
>
> Sorry but I cannot see where you identify the dependency hierarchy. Are
> you referring to your diagram?
>
> Eric
>
>
> On Mon, Mar 26, 2018 at 8:07 AM, Gimhana Nadeeshan <
> gimhanadesilva...@cse.mrt.ac.lk> wrote:
>
>> Hello devs,
>>
>> I have updated my draft proposal (Port codes from Commons Math
>> > OBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>)
>> -Timeline added; before submitting the final at the Google site. Feel free
>> to comment and give feedback to improve it.
>>
>> Best Regards,
>> Gimhana.
>>
>> On 24 March 2018 at 17:35, Gimhana Nadeeshan <
>> gimhanadesilva...@cse.mrt.ac.lk> wrote:
>>
>> > Hello devs,
>> >
>> >
>> >> Note that some of the repositories included in that screen do
>> >> not belong to "Commons":
>> >>  * sling-*
>> >>  * webservices-*
>> >>  * xml-*
>> >
>> >
>> > I'm working on it.(Still research on Kibble :-) )
>> >
>> > Botched alignments...
>> >> "cloc" has several output formats from which you could produce
>> >> nicer tables.
>> >
>> >
>> > I'm extremely sorry. I'll fix it asap.
>> >
>> > Best Regards,
>> > Gimhana
>> >
>> > On 23 March 2018 at 17:43, Gilles  wrote:
>> >
>> >> Hi Gimhana.
>> >>
>> >> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
>> >>
>> >>> Hello devs,
>> >>>
>> >>> By gone through @Gilles suggestions I found very interesting facts
>> about
>> >>> Commons projects.
>> >>>
>> >>> Feel free to check Kibble reports
>> >>>
>> >>> > >>> bfilter=commons=true=1458585000=1521743399>
>> >>> regarding these projects. It will be given a clear picture on the
>> >>> progress
>> >>> of projects.In the Commons Projects side it seems visible growth of
>> >>> contributors and releases.
>> >>>
>> >>
>> >> Note that some of the repositories included in that screen do
>> >> not belong to "Commons":
>> >>  * sling-*
>> >>  * webservices-*
>> >>  * xml-*
>> >>
>> >> There should be a way to filter them out.
>> >>
>> >> And I created a simple doc using the data collected from CLOC tool to
>> get
>> >>> an idea of commons projects. I think This kind of document will help
>> new
>> >>> volunteers to get a rough idea of the scope and the current status of
>> >>> projects before go deeper.Histogram of Commons Projects.
>> >>>
>> >>> > >>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
>> >>>
>> >>
>> >> Botched alignments...
>> >> "cloc" has several output formats from which you could produce
>> >> nicer tables.
>> >>
>> >> Regards,
>> >> Gilles
>> >>
>> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> >> For additional commands, e-mail: dev-h...@commons.apache.org
>> >>
>> >>
>> >
>>
>
>


Re: [Statistics] Port codes from Commons Math

2018-04-12 Thread Eric Barnhill
HI Gimhana,

Sorry for the delay in response, but you posted this right before our
two-week Easter holiday, for which I was completely absent ; then I needed
a few days back at work to clean up all the mess. :)

Your overall goals look good to me. You have gone right to the heart of the
matter and propose to reinvent the statistics tools to make good use of the
Java 8 API. I think that's great and you should get started. Your goal of
eliminating dependencies on Commons-Math is also right.

I noticed this in the proposal:

*Covariance stats=
> IntStream.of(1,2,3).collect(Covariance::new,Covariance::accept,Covariance::combine);*


Can you explain a bit more what is happening with the method references
"accept" and "combine"?

Also this

*Week 2: Begin porting the code according to the dependency hierarchy
> identified. *
>

Sorry but I cannot see where you identify the dependency hierarchy. Are you
referring to your diagram?

Eric


On Mon, Mar 26, 2018 at 8:07 AM, Gimhana Nadeeshan <
gimhanadesilva...@cse.mrt.ac.lk> wrote:

> Hello devs,
>
> I have updated my draft proposal (Port codes from Commons Math
>  eMnPaBsE9U5YhU/edit?usp=sharing>)
> -Timeline added; before submitting the final at the Google site. Feel free
> to comment and give feedback to improve it.
>
> Best Regards,
> Gimhana.
>
> On 24 March 2018 at 17:35, Gimhana Nadeeshan <
> gimhanadesilva...@cse.mrt.ac.lk> wrote:
>
> > Hello devs,
> >
> >
> >> Note that some of the repositories included in that screen do
> >> not belong to "Commons":
> >>  * sling-*
> >>  * webservices-*
> >>  * xml-*
> >
> >
> > I'm working on it.(Still research on Kibble :-) )
> >
> > Botched alignments...
> >> "cloc" has several output formats from which you could produce
> >> nicer tables.
> >
> >
> > I'm extremely sorry. I'll fix it asap.
> >
> > Best Regards,
> > Gimhana
> >
> > On 23 March 2018 at 17:43, Gilles  wrote:
> >
> >> Hi Gimhana.
> >>
> >> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
> >>
> >>> Hello devs,
> >>>
> >>> By gone through @Gilles suggestions I found very interesting facts
> about
> >>> Commons projects.
> >>>
> >>> Feel free to check Kibble reports
> >>>
> >>>  >>> bfilter=commons=true=1458585000=1521743399>
> >>> regarding these projects. It will be given a clear picture on the
> >>> progress
> >>> of projects.In the Commons Projects side it seems visible growth of
> >>> contributors and releases.
> >>>
> >>
> >> Note that some of the repositories included in that screen do
> >> not belong to "Commons":
> >>  * sling-*
> >>  * webservices-*
> >>  * xml-*
> >>
> >> There should be a way to filter them out.
> >>
> >> And I created a simple doc using the data collected from CLOC tool to
> get
> >>> an idea of commons projects. I think This kind of document will help
> new
> >>> volunteers to get a rough idea of the scope and the current status of
> >>> projects before go deeper.Histogram of Commons Projects.
> >>>
> >>>  >>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
> >>>
> >>
> >> Botched alignments...
> >> "cloc" has several output formats from which you could produce
> >> nicer tables.
> >>
> >> Regards,
> >> Gilles
> >>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> >> For additional commands, e-mail: dev-h...@commons.apache.org
> >>
> >>
> >
>


Re: [Statistics] Port codes from Commons Math

2018-03-26 Thread Gimhana Nadeeshan
Hello devs,

I have updated my draft proposal (Port codes from Commons Math
)
-Timeline added; before submitting the final at the Google site. Feel free
to comment and give feedback to improve it.

Best Regards,
Gimhana.

On 24 March 2018 at 17:35, Gimhana Nadeeshan <
gimhanadesilva...@cse.mrt.ac.lk> wrote:

> Hello devs,
>
>
>> Note that some of the repositories included in that screen do
>> not belong to "Commons":
>>  * sling-*
>>  * webservices-*
>>  * xml-*
>
>
> I'm working on it.(Still research on Kibble :-) )
>
> Botched alignments...
>> "cloc" has several output formats from which you could produce
>> nicer tables.
>
>
> I'm extremely sorry. I'll fix it asap.
>
> Best Regards,
> Gimhana
>
> On 23 March 2018 at 17:43, Gilles  wrote:
>
>> Hi Gimhana.
>>
>> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
>>
>>> Hello devs,
>>>
>>> By gone through @Gilles suggestions I found very interesting facts about
>>> Commons projects.
>>>
>>> Feel free to check Kibble reports
>>>
>>> >> bfilter=commons=true=1458585000=1521743399>
>>> regarding these projects. It will be given a clear picture on the
>>> progress
>>> of projects.In the Commons Projects side it seems visible growth of
>>> contributors and releases.
>>>
>>
>> Note that some of the repositories included in that screen do
>> not belong to "Commons":
>>  * sling-*
>>  * webservices-*
>>  * xml-*
>>
>> There should be a way to filter them out.
>>
>> And I created a simple doc using the data collected from CLOC tool to get
>>> an idea of commons projects. I think This kind of document will help new
>>> volunteers to get a rough idea of the scope and the current status of
>>> projects before go deeper.Histogram of Commons Projects.
>>>
>>> >> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
>>>
>>
>> Botched alignments...
>> "cloc" has several output formats from which you could produce
>> nicer tables.
>>
>> Regards,
>> Gilles
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
>


Re: [Statistics] Port codes from Commons Math

2018-03-24 Thread Gimhana Nadeeshan
Hello devs,


> Note that some of the repositories included in that screen do
> not belong to "Commons":
>  * sling-*
>  * webservices-*
>  * xml-*


I'm working on it.(Still research on Kibble :-) )

Botched alignments...
> "cloc" has several output formats from which you could produce
> nicer tables.


I'm extremely sorry. I'll fix it asap.

Best Regards,
Gimhana

On 23 March 2018 at 17:43, Gilles  wrote:

> Hi Gimhana.
>
> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
>
>> Hello devs,
>>
>> By gone through @Gilles suggestions I found very interesting facts about
>> Commons projects.
>>
>> Feel free to check Kibble reports
>>
>> > bfilter=commons=true=1458585000=1521743399>
>> regarding these projects. It will be given a clear picture on the progress
>> of projects.In the Commons Projects side it seems visible growth of
>> contributors and releases.
>>
>
> Note that some of the repositories included in that screen do
> not belong to "Commons":
>  * sling-*
>  * webservices-*
>  * xml-*
>
> There should be a way to filter them out.
>
> And I created a simple doc using the data collected from CLOC tool to get
>> an idea of commons projects. I think This kind of document will help new
>> volunteers to get a rough idea of the scope and the current status of
>> projects before go deeper.Histogram of Commons Projects.
>>
>> > 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
>>
>
> Botched alignments...
> "cloc" has several output formats from which you could produce
> nicer tables.
>
> Regards,
> Gilles
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [Statistics] Port codes from Commons Math

2018-03-23 Thread Gilles

Hi Gimhana.

On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:

Hello devs,

By gone through @Gilles suggestions I found very interesting facts 
about

Commons projects.

Feel free to check Kibble reports


regarding these projects. It will be given a clear picture on the 
progress

of projects.In the Commons Projects side it seems visible growth of
contributors and releases.


Note that some of the repositories included in that screen do
not belong to "Commons":
 * sling-*
 * webservices-*
 * xml-*

There should be a way to filter them out.

And I created a simple doc using the data collected from CLOC tool to 
get
an idea of commons projects. I think This kind of document will help 
new

volunteers to get a rough idea of the scope and the current status of
projects before go deeper.Histogram of Commons Projects.




Botched alignments...
"cloc" has several output formats from which you could produce
nicer tables.

Regards,
Gilles


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Statistics] Port codes from Commons Math

2018-03-22 Thread Gimhana Nadeeshan
Hello devs,

By gone through @Gilles suggestions I found very interesting facts about
Commons projects.

Feel free to check Kibble reports

regarding these projects. It will be given a clear picture on the progress
of projects.In the Commons Projects side it seems visible growth of
contributors and releases.

And I created a simple doc using the data collected from CLOC tool to get
an idea of commons projects. I think This kind of document will help new
volunteers to get a rough idea of the scope and the current status of
projects before go deeper.Histogram of Commons Projects.


Best Regards,
Gimhana.


Re: [Statistics] Port codes from Commons Math

2018-03-21 Thread Gilles

Hi Gimhana.

On Tue, 20 Mar 2018 14:36:10 +0530, Gimhana Nadeeshan wrote:

Hello devs,

I have updated my draft proposal with @Gilles's suggestions[Draft 
Proposal

V1.1

]
and I think I need some more clarifications on below suggestions


I haven't read the new version yet; I'll try to answer
some of the below questions.





== "Background" section ==






number of listed/active contributors

histogram of component's sizes (lines of code)



How to recognize "active contributors" ?


Good question!
Perhaps organize a survey? :-)


In the ML


A possible source, but probably not very efficient.  I certainly
do not suggest to perform a "manual" counting of who talks about
what. ;-)
Actually I don't know how to make automated queries (if possible).

You might want to have a look at "kibble":
  http://kibble.apache.org/
I asked that "Commons" projects be added to their "live demo".
But never got much time to explore the information it extracts
from various data sources.
I'd guess that it would be quite interesting to get more acquainted
with that tool.  Don't hesitate to subscribe to their ML (I'm not);
you might then report here what you found useful and which a lot of
us may not be aware of.


or GitHub?


AFAICT, it would be completely biased.  Indeed if one looks at
that page, for example:
  https://github.com/apache/commons-rng/graphs/contributors
there is absolutely no trace that someone not mentioned there
performed 88% of all commits. [And this is much lower than the
actual number of deletions/additions...]


What do you
mean by "histogram of component's sizes (lines of code)" ?


There is a (command-line) tool called "cloc".
You could "clone" the repositories (of a selection of the
active or popular components) and run it on the "src/main"
directory to get some indication of the size of the projects.


== "Deliverables" section ==


 * less dependencies" (an example?)
 * "Advanced mathematical functionalities": other than what
   exists now?  Or do you mean new interfaces (e.g. in
   accordance with the APIs provided by JDK8)?



Most of the classes in "math4.stat" contains "math4.exception" 
classes. And
some classes in "correlations module" dependent to "RealMatix" 
interfaces.

Can those be considered as dependencies ?


Yes.  And we don't want that any of the new component to depend
on "Commons Math" code. [With perhaps an exception for the "test"
scope.]


Can't exceptions be substituted
with inbuilt java exceptions ?


Certainly.  You should take a look at what is done in "Commons
Numbers".


@Gilles would you please explain this matrix
issue because I didn't get it much


Design issues were identified a long time ago.
You should be able to find them by doing a "Search issues"
in JIRA.
Bottom-line is we don't want to depend on an API that must
be changed (at some point).
Although not ideal, a workaround is to copy the necessary
functionality over to the new component but ensure that it
is *not* part of the API.
Better would be to tackle the issues themselves but it has
proven difficult and was postponed several times...


== "Implementation" section ==


* "Design goals": give concrete examples.


I noted some examples for Design Goals in my proposal.  But I'm not 
sure I

that I wrote it correctly. (And don't know those are the examples you
expect me to mention.) Please clarify those too.


IIRC, you mention streams. So for example, you could show how
the contribution would enhance usage (comparing "before"/"after").



== "Results" section ==


Hope to get comment from PMC...
[Wish list, design requirements, mentor(s), etc.]



mentor(s)?? @Gilles,@Eric wont't you guys be the mentors of this 
project ??
I'm asking this because I'm new to ASF and GSoC !! And I'm appreciate 
to

know how this is working !!


I don't know whether there is an official "mentor" role for GSoC,
and if so, what that implies.  This is also new for me; so I hope
that people can give advice about the "administrative" side.

For the contents, Eric indeed proposed to participate, but I
don't know how available he is (and how this will fit with
the GSoC timetable, which I also don't know).
It seems that Eric's contributions are currently extremely
asynchronous... :-}


In the meantime, you could also review the open issues for "Commons

Numbers":
  https://issues.apache.org/jira/projects/NUMBERS/

This is quite important as almost all other "Commons Math"
spin-offs will have some dependency on this new component;
hence a release of "Commons Numbers" must precede a release
of either "Commons Statistics" or "Commons Geometry".



Yep Wow sure...I'm on my way right now CM Numbers!!


Whenever you find something you can handle tackle, please
submit a PR.

Thanks,
Gilles


[...]



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional 

Re: [Statistics] Port codes from Commons Math

2018-03-20 Thread Gimhana Nadeeshan
Hello devs,

I have updated my draft proposal with @Gilles's suggestions[Draft Proposal
V1.1
]
and I think I need some more clarifications on below suggestions


> == "Background" section ==
>

>
number of listed/active contributors
> histogram of component's sizes (lines of code)


How to recognize "active contributors" ? In the ML or GitHub ? What do you
mean by "histogram of component's sizes (lines of code)" ?


== "Deliverables" section ==
>
>  * less dependencies" (an example?)
>  * "Advanced mathematical functionalities": other than what
>exists now?  Or do you mean new interfaces (e.g. in
>accordance with the APIs provided by JDK8)?
>

Most of the classes in "math4.stat" contains "math4.exception" classes. And
some classes in "correlations module" dependent to "RealMatix" interfaces.
Can those be considered as dependencies ? Can't exceptions be substituted
with inbuilt java exceptions ? @Gilles would you please explain this matrix
issue because I didn't get it much


> == "Implementation" section ==

* "Design goals": give concrete examples.


I noted some examples for Design Goals in my proposal.  But I'm not sure I
that I wrote it correctly. (And don't know those are the examples you
expect me to mention.) Please clarify those too.

== "Results" section ==
>
> Hope to get comment from PMC...
> [Wish list, design requirements, mentor(s), etc.]
>

mentor(s)?? @Gilles,@Eric wont't you guys be the mentors of this project ??
I'm asking this because I'm new to ASF and GSoC !! And I'm appreciate to
know how this is working !!

In the meantime, you could also review the open issues for "Commons
> Numbers":
>   https://issues.apache.org/jira/projects/NUMBERS/
>
> This is quite important as almost all other "Commons Math"
> spin-offs will have some dependency on this new component;
> hence a release of "Commons Numbers" must precede a release
> of either "Commons Statistics" or "Commons Geometry".
>

Yep Wow sure...I'm on my way right now CM Numbers!!


Nadeeshan Gimhana

Batch Representative (15' batch)

Department of Computer Science & Engineering

University of Moratuwa

*Mobile :+94775744613*


*Website : https://ngimhana94.wixsite.com/gimhanadesilva/
*

*L**inkedin **:www.linkedin.com/in/nadeeshangimhana/
*


* *


* *



On 19 March 2018 at 03:46, Gilles  wrote:

> Hello.
>
> On Sun, 18 Mar 2018 23:29:44 +0530, Gimhana Nadeeshan wrote:
>
>> Hi ,
>>
>> Thanks a lot Gilles for your valuable suggestions and give the reviews so
>> quickly. I'll apply those corrections asked for any clarifications in
>> here.
>> By the way since I'm new to Apache Community I'm not yet familiar with
>> some
>> abbreviations used in the list. [such as ML archive, PMC ]
>>
>
> Sorry!
> ML = Mailing List
> PMC = Project Management Committee
>
>
>> AFAICT, porting "o.a.c.math4.geometry" will be much
>>
>>> easier and likely to be finished before "Commons
>>> Statistics". :-}
>>>
>>>
>> Since the design structure is the same, this would be interesting and
>> easier. But is it allowed in GSoC? [Since it not labeled as GSoC idea at
>> JIRA !!]
>>
>
> If it's just a matter of creating a GSoC task, not a big problem. ;-)
> For would-be "Commons Geometry", I'm waiting for the green light
> from our expert contributor, Matt Juntunen.
> In the meantime, you could also review the open issues for "Commons
> Numbers":
>   https://issues.apache.org/jira/projects/NUMBERS/
>
> This is quite important as almost all other "Commons Math"
> spin-offs will have some dependency on this new component;
> hence a release of "Commons Numbers" must precede a release
> of either "Commons Statistics" or "Commons Geometry".
>
> Best,
> Gilles
>
> Best Regards,
>> Gimhana.
>>
>> [...]

>>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [Statistics] Port codes from Commons Math

2018-03-18 Thread Gilles

Hello.

On Sun, 18 Mar 2018 23:29:44 +0530, Gimhana Nadeeshan wrote:

Hi ,

Thanks a lot Gilles for your valuable suggestions and give the 
reviews so
quickly. I'll apply those corrections asked for any clarifications in 
here.
By the way since I'm new to Apache Community I'm not yet familiar 
with some

abbreviations used in the list. [such as ML archive, PMC ]


Sorry!
ML = Mailing List
PMC = Project Management Committee



AFAICT, porting "o.a.c.math4.geometry" will be much

easier and likely to be finished before "Commons
Statistics". :-}



Since the design structure is the same, this would be interesting and
easier. But is it allowed in GSoC? [Since it not labeled as GSoC idea 
at

JIRA !!]


If it's just a matter of creating a GSoC task, not a big problem. ;-)
For would-be "Commons Geometry", I'm waiting for the green light
from our expert contributor, Matt Juntunen.
In the meantime, you could also review the open issues for "Commons
Numbers":
  https://issues.apache.org/jira/projects/NUMBERS/

This is quite important as almost all other "Commons Math"
spin-offs will have some dependency on this new component;
hence a release of "Commons Numbers" must precede a release
of either "Commons Statistics" or "Commons Geometry".

Best,
Gilles


Best Regards,
Gimhana.


[...]



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Statistics] Port codes from Commons Math

2018-03-18 Thread Gimhana Nadeeshan
Hi ,

Thanks a lot Gilles for your valuable suggestions and give the reviews so
quickly. I'll apply those corrections asked for any clarifications in here.
By the way since I'm new to Apache Community I'm not yet familiar with some
abbreviations used in the list. [such as ML archive, PMC ]

AFAICT, porting "o.a.c.math4.geometry" will be much
> easier and likely to be finished before "Commons
> Statistics". :-}
>

Since the design structure is the same, this would be interesting and
easier. But is it allowed in GSoC? [Since it not labeled as GSoC idea at
JIRA !!]

Best Regards,
Gimhana.

On 18 March 2018 at 21:18, Gilles  wrote:

> Hi Gimhana.
>
> On Sun, 18 Mar 2018 19:17:44 +0530, Gimhana Nadeeshan wrote:
>
>> Hii,
>>
>> I have just shared my draft proposal for GSoC. Port Codes from Commons
>> Math.
>>
>> > OBOqTOeMnPaBsE9U5YhU/edit>
>>
>
> Wow; probably the first time that such a structured document
> appears on this list. ;-)
>
> Devs, would you please review it and I always welcome your precious
>> suggestions to improve it.
>>
>
> OK.  I'll try to provide some clarifications and words of
> caution.
>
> == "Background" section ==
> Useful to cite:
> (for Commons in general)
>  * number of stable/active/dormant components
>  * number of listed/active contributors
>  * overview of topics covered
>  * histogram of component's sizes (lines of code)
> (for Commons Math)
>  * how it fits within the above data
>
> And draw some conclusions out of the comparison.
> You stress "before JDK 1.8"; worth noting that some codes
> dates back to before JDK 1.5!
> Code age is not necessarily a problem per se, but the mix
> (of designs linked to outdated JDK) is, IMHO, a development
> nightmare.
> Modularization can alleviate the unwanted consequences (such
> as release stalled due to the lack of support).
>
> == "Deliverables" section ==
>
> Clarify what is meant by
>  * "less dependencies" (an example?)
>  * "Advanced mathematical functionalities": other than what
>exists now?  Or do you mean new interfaces (e.g. in
>accordance with the APIs provided by JDK8)?
>  * "implemented module" (singular). I would assume that
>"Commons Statistics" will provide many modules.
>  * "Guide for refactoring [..] Commons packages": That is
>unlikely. ;-)
>Did you more modestly mean "Commons Math packages"?
>You should perhaps note (in the "Background" section)
>that the task has been started two year ago (cf.
>"Commons RNG" and "Commons Numbers").
>
> Another quite useful task is: set up the web site.
>
> == "Implementation" section ==
>
>  * "Design issues": list *actual* issues (see JIRA).
>Working with stream would better be described as an
>enhancement.
>  * Describe "too many dependencies" (examples).
>  * "Design goals": give concrete examples.
>
> The class diagram is nice but I see a big issue with
> the "matrix" functionality. [This was one of the reason
> I wrote a few months ago (cf. ML archive) that the
> refactoring of the "o.a.c.math4.stat" was not among the
> low-hanging fruits of the refactoring.]
> If ever possible, better start with functionality that
> doesn't need the CM matrix code.
>
> == "Results" section ==
>
> Hope to get comment from PMC...
> [Wish list, design requirements, mentor(s), etc.]
>
> == "Future Development" section ==
>
> AFAICT, porting "o.a.c.math4.geometry" will be much
> easier and likely to be finished before "Commons
> Statistics". :-}
>
>
> Thanks for your interest,
>
> Gilles
>
> Best Regards,
>> Gimhana
>>
>> On 17 March 2018 at 05:06, Gilles  wrote:
>>
>> Hi.
>>>
>>> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>>>
>>> Hi devs,

 Sorry for the delayed reply due to my academics.


 If you want to start playing with the code, we could just begin

> by having discussions here (on design) and on JIRA (for processing
> minor issues) based on the current state of your repository.
> [What's the link to look it up?]
>
>
> Should I create my own repo and start code in there?[Not in the forked
 repo]


>>> What's the difference?  IOW, someone else should answer. :-}
>>>
>>> Actually it will be more helpful to me if someone [ @Gilles or @Eric ]
>>> can
>>>
 guide me more. Like, to give me some minor issues in the current
 implementation to solve or as a new feature implementation and gradually
 we
 can go for deeper


>>> IMO, the top priority would be to release "Commons Numbers":
>>>   http://commons.apache.org/proper/commons-numbers/
>>>
>>> There are some blocking issues on JIRA:
>>>   https://issues.apache.org/jira/projects/NUMBERS
>>>
>>> and eventually I can go further my my own way.  Then I
>>>
 can gradually familiar with the code and I think it is the most
 efficient
 way to learn the design architecture.[I spent 

Re: [Statistics] Port codes from Commons Math

2018-03-18 Thread Gilles

Hi Gimhana.

On Sun, 18 Mar 2018 19:17:44 +0530, Gimhana Nadeeshan wrote:

Hii,

I have just shared my draft proposal for GSoC. Port Codes from 
Commons Math.





Wow; probably the first time that such a structured document
appears on this list. ;-)


Devs, would you please review it and I always welcome your precious
suggestions to improve it.


OK.  I'll try to provide some clarifications and words of
caution.

== "Background" section ==
Useful to cite:
(for Commons in general)
 * number of stable/active/dormant components
 * number of listed/active contributors
 * overview of topics covered
 * histogram of component's sizes (lines of code)
(for Commons Math)
 * how it fits within the above data

And draw some conclusions out of the comparison.
You stress "before JDK 1.8"; worth noting that some codes
dates back to before JDK 1.5!
Code age is not necessarily a problem per se, but the mix
(of designs linked to outdated JDK) is, IMHO, a development
nightmare.
Modularization can alleviate the unwanted consequences (such
as release stalled due to the lack of support).

== "Deliverables" section ==

Clarify what is meant by
 * "less dependencies" (an example?)
 * "Advanced mathematical functionalities": other than what
   exists now?  Or do you mean new interfaces (e.g. in
   accordance with the APIs provided by JDK8)?
 * "implemented module" (singular). I would assume that
   "Commons Statistics" will provide many modules.
 * "Guide for refactoring [..] Commons packages": That is
   unlikely. ;-)
   Did you more modestly mean "Commons Math packages"?
   You should perhaps note (in the "Background" section)
   that the task has been started two year ago (cf.
   "Commons RNG" and "Commons Numbers").

Another quite useful task is: set up the web site.

== "Implementation" section ==

 * "Design issues": list *actual* issues (see JIRA).
   Working with stream would better be described as an
   enhancement.
 * Describe "too many dependencies" (examples).
 * "Design goals": give concrete examples.

The class diagram is nice but I see a big issue with
the "matrix" functionality. [This was one of the reason
I wrote a few months ago (cf. ML archive) that the
refactoring of the "o.a.c.math4.stat" was not among the
low-hanging fruits of the refactoring.]
If ever possible, better start with functionality that
doesn't need the CM matrix code.

== "Results" section ==

Hope to get comment from PMC...
[Wish list, design requirements, mentor(s), etc.]

== "Future Development" section ==

AFAICT, porting "o.a.c.math4.geometry" will be much
easier and likely to be finished before "Commons
Statistics". :-}


Thanks for your interest,
Gilles


Best Regards,
Gimhana

On 17 March 2018 at 05:06, Gilles  
wrote:



Hi.

On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:


Hi devs,

Sorry for the delayed reply due to my academics.


If you want to start playing with the code, we could just begin

by having discussions here (on design) and on JIRA (for processing
minor issues) based on the current state of your repository.
[What's the link to look it up?]


Should I create my own repo and start code in there?[Not in the 
forked

repo]



What's the difference?  IOW, someone else should answer. :-}

Actually it will be more helpful to me if someone [ @Gilles or @Eric 
] can

guide me more. Like, to give me some minor issues in the current
implementation to solve or as a new feature implementation and 
gradually

we
can go for deeper



IMO, the top priority would be to release "Commons Numbers":
  http://commons.apache.org/proper/commons-numbers/

There are some blocking issues on JIRA:
  https://issues.apache.org/jira/projects/NUMBERS

and eventually I can go further my my own way.  Then I
can gradually familiar with the code and I think it is the most 
efficient
way to learn the design architecture.[I spent hours to understand 
the
current code basis and I felt that was not so efficient as I 
thought]




Refactoring the package "stat" is not straightforward...
However, to get to that, it would be useful to record your thoughts
as you browse through the code(s): what seems easy to port, what 
should

be changed/fixed, what you don't understand, and so on.



And if there is a format of Proposal regarding ASF ?



I don't think so.  This ML is the forum where project directions
are discussed.

If not what should I

mention in the proposal basically?



This can be a work in progress, I think (see above suggestions).

Best regards,
Gilles




Best Regards,




On 14 March 2018 at 19:07, Gilles  
wrote:


Hi.


On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:

Hello Devs,


Thanks Gilles and Eric for guidance.

I have cloned the Commons repos and forked the Common's Stat 
repo. Is it

possible to make pull requests to that repo to be reviewed?



That's certainly 

Re: [Statistics] Port codes from Commons Math

2018-03-18 Thread Gimhana Nadeeshan
Hii,

I have not decided the timeline yet. I suppose to decide it after Design
Architecture is confirmed.


Best Regards,
Gimhana.

On 18 March 2018 at 19:17, Gimhana Nadeeshan <
gimhanadesilva...@cse.mrt.ac.lk> wrote:

> Hii,
>
> I have just shared my draft proposal for GSoC. Port Codes from Commons
> Math.
> 
> Devs, would you please review it and I always welcome your precious
> suggestions to improve it.
>
> Best Regards,
> Gimhana
>
> On 17 March 2018 at 05:06, Gilles  wrote:
>
>> Hi.
>>
>> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>>
>>> Hi devs,
>>>
>>> Sorry for the delayed reply due to my academics.
>>>
>>>
>>> If you want to start playing with the code, we could just begin
 by having discussions here (on design) and on JIRA (for processing
 minor issues) based on the current state of your repository.
 [What's the link to look it up?]


>>> Should I create my own repo and start code in there?[Not in the forked
>>> repo]
>>>
>>
>> What's the difference?  IOW, someone else should answer. :-}
>>
>> Actually it will be more helpful to me if someone [ @Gilles or @Eric ] can
>>> guide me more. Like, to give me some minor issues in the current
>>> implementation to solve or as a new feature implementation and gradually
>>> we
>>> can go for deeper
>>>
>>
>> IMO, the top priority would be to release "Commons Numbers":
>>   http://commons.apache.org/proper/commons-numbers/
>>
>> There are some blocking issues on JIRA:
>>   https://issues.apache.org/jira/projects/NUMBERS
>>
>> and eventually I can go further my my own way.  Then I
>>> can gradually familiar with the code and I think it is the most efficient
>>> way to learn the design architecture.[I spent hours to understand the
>>> current code basis and I felt that was not so efficient as I thought]
>>>
>>
>> Refactoring the package "stat" is not straightforward...
>> However, to get to that, it would be useful to record your thoughts
>> as you browse through the code(s): what seems easy to port, what should
>> be changed/fixed, what you don't understand, and so on.
>>
>>
>>> And if there is a format of Proposal regarding ASF ?
>>>
>>
>> I don't think so.  This ML is the forum where project directions
>> are discussed.
>>
>> If not what should I
>>> mention in the proposal basically?
>>>
>>
>> This can be a work in progress, I think (see above suggestions).
>>
>> Best regards,
>> Gilles
>>
>>
>>
>>> Best Regards,
>>>
>>>
>>>
>>>
>>> On 14 March 2018 at 19:07, Gilles  wrote:
>>>
>>> Hi.

 On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:

 Hello Devs,
>
> Thanks Gilles and Eric for guidance.
>
> I have cloned the Commons repos and forked the Common's Stat repo. Is
> it
> possible to make pull requests to that repo to be reviewed?
>
>
 That's certainly possible, but I'm afraid that it will become
 quite unwieldy from my side if I have to delete/create branches
 for every PR.

 If you want to start playing with the code, we could just begin
 by having discussions here (on design) and on JIRA (for processing
 minor issues) based on the current state of your repository.
 [What's the link to look it up?]

 Or should I

> follow a specific method?
>
>
 I'll inquire about a more efficient method (than the above)...

 By referring the API docs I got some idea of the separation of modules.

>
> In the current Commons's stat repo there are some classes under the
> package  distribution. I think those can be refactored using java 8 in
> build statistics functionalities. Please correct me if I wrong.
>
>
 An example perhaps?

 As Eric said separation of function and streaming implementations is
 good

> idea as designing. (In my point of view, it means method overloading ->
> Again correct me if I didn't understand your fact correctly)
>
>
 ?

 And I will share my draft proposal here for your review soon.

>
>
 OK.

 Thanks again for your interest,
 Gilles



 Best Regards.
>
> On 13 March 2018 at 20:50, Gilles 
> wrote:
>
> Hello.
>
>>
>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>
>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles <
>> gil...@harfang.homelinux.org>
>>
>>> wrote:
>>>
>>>
>>>
>>> Where can we find the old code before port into new Commons
 components?

>
>
> The code bases are managed by the "git" software; the whole
> history is
>
 available:
   https://git1-us-west.apache.org/repos/asf?p=commons-math.git
 ;a=log

 [I'd 

Re: [Statistics] Port codes from Commons Math

2018-03-18 Thread Gimhana Nadeeshan
Hii,

I have just shared my draft proposal for GSoC. Port Codes from Commons Math.

Devs, would you please review it and I always welcome your precious
suggestions to improve it.

Best Regards,
Gimhana

On 17 March 2018 at 05:06, Gilles  wrote:

> Hi.
>
> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>
>> Hi devs,
>>
>> Sorry for the delayed reply due to my academics.
>>
>>
>> If you want to start playing with the code, we could just begin
>>> by having discussions here (on design) and on JIRA (for processing
>>> minor issues) based on the current state of your repository.
>>> [What's the link to look it up?]
>>>
>>>
>> Should I create my own repo and start code in there?[Not in the forked
>> repo]
>>
>
> What's the difference?  IOW, someone else should answer. :-}
>
> Actually it will be more helpful to me if someone [ @Gilles or @Eric ] can
>> guide me more. Like, to give me some minor issues in the current
>> implementation to solve or as a new feature implementation and gradually
>> we
>> can go for deeper
>>
>
> IMO, the top priority would be to release "Commons Numbers":
>   http://commons.apache.org/proper/commons-numbers/
>
> There are some blocking issues on JIRA:
>   https://issues.apache.org/jira/projects/NUMBERS
>
> and eventually I can go further my my own way.  Then I
>> can gradually familiar with the code and I think it is the most efficient
>> way to learn the design architecture.[I spent hours to understand the
>> current code basis and I felt that was not so efficient as I thought]
>>
>
> Refactoring the package "stat" is not straightforward...
> However, to get to that, it would be useful to record your thoughts
> as you browse through the code(s): what seems easy to port, what should
> be changed/fixed, what you don't understand, and so on.
>
>
>> And if there is a format of Proposal regarding ASF ?
>>
>
> I don't think so.  This ML is the forum where project directions
> are discussed.
>
> If not what should I
>> mention in the proposal basically?
>>
>
> This can be a work in progress, I think (see above suggestions).
>
> Best regards,
> Gilles
>
>
>
>> Best Regards,
>>
>>
>>
>>
>> On 14 March 2018 at 19:07, Gilles  wrote:
>>
>> Hi.
>>>
>>> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>>>
>>> Hello Devs,

 Thanks Gilles and Eric for guidance.

 I have cloned the Commons repos and forked the Common's Stat repo. Is it
 possible to make pull requests to that repo to be reviewed?


>>> That's certainly possible, but I'm afraid that it will become
>>> quite unwieldy from my side if I have to delete/create branches
>>> for every PR.
>>>
>>> If you want to start playing with the code, we could just begin
>>> by having discussions here (on design) and on JIRA (for processing
>>> minor issues) based on the current state of your repository.
>>> [What's the link to look it up?]
>>>
>>> Or should I
>>>
 follow a specific method?


>>> I'll inquire about a more efficient method (than the above)...
>>>
>>> By referring the API docs I got some idea of the separation of modules.
>>>

 In the current Commons's stat repo there are some classes under the
 package  distribution. I think those can be refactored using java 8 in
 build statistics functionalities. Please correct me if I wrong.


>>> An example perhaps?
>>>
>>> As Eric said separation of function and streaming implementations is good
>>>
 idea as designing. (In my point of view, it means method overloading ->
 Again correct me if I didn't understand your fact correctly)


>>> ?
>>>
>>> And I will share my draft proposal here for your review soon.
>>>


>>> OK.
>>>
>>> Thanks again for your interest,
>>> Gilles
>>>
>>>
>>>
>>> Best Regards.

 On 13 March 2018 at 20:50, Gilles  wrote:

 Hello.

>
> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>
> On Tue, Mar 13, 2018 at 12:47 AM, Gilles  >
>
>> wrote:
>>
>>
>>
>> Where can we find the old code before port into new Commons
>>> components?
>>>


 The code bases are managed by the "git" software; the whole history
 is

>>> available:
>>>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>>>
>>> [I'd advise to "clone" the repositories on your local computer, and
>>> use the command line tools.]
>>>
>>>
>>>
>> I believe you will want to clone the commons-math repositories, but
>> then
>> develop your own "fork" of the commons-statistics repository. Gilles
>> can
>> correct me if that is wrong.
>>
>>
>> Actually, I know only my workflow:
>  $ git clone ...
>  $ git 

Re: [Statistics] Port codes from Commons Math

2018-03-16 Thread Gilles

Hi.

On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:

Hi devs,

Sorry for the delayed reply due to my academics.



If you want to start playing with the code, we could just begin
by having discussions here (on design) and on JIRA (for processing
minor issues) based on the current state of your repository.
[What's the link to look it up?]



Should I create my own repo and start code in there?[Not in the 
forked repo]


What's the difference?  IOW, someone else should answer. :-}

Actually it will be more helpful to me if someone [ @Gilles or @Eric 
] can

guide me more. Like, to give me some minor issues in the current
implementation to solve or as a new feature implementation and 
gradually we

can go for deeper


IMO, the top priority would be to release "Commons Numbers":
  http://commons.apache.org/proper/commons-numbers/

There are some blocking issues on JIRA:
  https://issues.apache.org/jira/projects/NUMBERS


and eventually I can go further my my own way.  Then I
can gradually familiar with the code and I think it is the most 
efficient

way to learn the design architecture.[I spent hours to understand the
current code basis and I felt that was not so efficient as I thought]


Refactoring the package "stat" is not straightforward...
However, to get to that, it would be useful to record your thoughts
as you browse through the code(s): what seems easy to port, what should
be changed/fixed, what you don't understand, and so on.



And if there is a format of Proposal regarding ASF ?


I don't think so.  This ML is the forum where project directions
are discussed.


If not what should I
mention in the proposal basically?


This can be a work in progress, I think (see above suggestions).

Best regards,
Gilles



Best Regards,




On 14 March 2018 at 19:07, Gilles  
wrote:



Hi.

On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:


Hello Devs,

Thanks Gilles and Eric for guidance.

I have cloned the Commons repos and forked the Common's Stat repo. 
Is it

possible to make pull requests to that repo to be reviewed?



That's certainly possible, but I'm afraid that it will become
quite unwieldy from my side if I have to delete/create branches
for every PR.

If you want to start playing with the code, we could just begin
by having discussions here (on design) and on JIRA (for processing
minor issues) based on the current state of your repository.
[What's the link to look it up?]

Or should I

follow a specific method?



I'll inquire about a more efficient method (than the above)...

By referring the API docs I got some idea of the separation of 
modules.


In the current Commons's stat repo there are some classes under the
package  distribution. I think those can be refactored using java 8 
in

build statistics functionalities. Please correct me if I wrong.



An example perhaps?

As Eric said separation of function and streaming implementations is 
good
idea as designing. (In my point of view, it means method 
overloading ->

Again correct me if I didn't understand your fact correctly)



?

And I will share my draft proposal here for your review soon.




OK.

Thanks again for your interest,
Gilles




Best Regards.

On 13 March 2018 at 20:50, Gilles  
wrote:


Hello.


On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:

On Tue, Mar 13, 2018 at 12:47 AM, Gilles 


wrote:



Where can we find the old code before port into new Commons 
components?



The code bases are managed by the "git" software; the whole 
history is

available:
  
https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log


[I'd advise to "clone" the repositories on your local computer, 
and

use the command line tools.]




I believe you will want to clone the commons-math repositories, 
but then
develop your own "fork" of the commons-statistics repository. 
Gilles can

correct me if that is wrong.



Actually, I know only my workflow:
 $ git clone ...
 $ git branch ...
 $ git commit ...
 $ git push

:-}

I didn't find it very easy to cooperate with developers who
fork on GitHub and submit PRs.
I've now found the "git" command that creates a branch from
a PR, but it would be so much more comfortable to just switch
directory and do "git pull".

In the context of GSoC, would it be possible to grant some
privilege to non-committers so that they can update a selected
"git" repository?
If not, what is the next easiest way to share a "common space"
(aka "sandbox") from which it would be easy to copy reviewed
bits over to the official source repository?


As


you mentioned it will be a good approach to redesign process.



You don't necessarily need to analyze how the code was before

the port/refactoring; looking at how it is now is sufficient,
unless you suspect that something is wrong now and might have
been better before. ;-)


In particular, the statistics library was designed before Java 
8. Java

8

Re: [Statistics] Port codes from Commons Math

2018-03-16 Thread Gimhana Nadeeshan
Hi devs,

Sorry for the delayed reply due to my academics.


> If you want to start playing with the code, we could just begin
> by having discussions here (on design) and on JIRA (for processing
> minor issues) based on the current state of your repository.
> [What's the link to look it up?]
>

Should I create my own repo and start code in there?[Not in the forked repo]

Actually it will be more helpful to me if someone [ @Gilles or @Eric ] can
guide me more. Like, to give me some minor issues in the current
implementation to solve or as a new feature implementation and gradually we
can go for deeper and eventually I can go further my my own way.  Then I
can gradually familiar with the code and I think it is the most efficient
way to learn the design architecture.[I spent hours to understand the
current code basis and I felt that was not so efficient as I thought]

And if there is a format of Proposal regarding ASF ? If not what should I
mention in the proposal basically?

Best Regards,




On 14 March 2018 at 19:07, Gilles  wrote:

> Hi.
>
> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>
>> Hello Devs,
>>
>> Thanks Gilles and Eric for guidance.
>>
>> I have cloned the Commons repos and forked the Common's Stat repo. Is it
>> possible to make pull requests to that repo to be reviewed?
>>
>
> That's certainly possible, but I'm afraid that it will become
> quite unwieldy from my side if I have to delete/create branches
> for every PR.
>
> If you want to start playing with the code, we could just begin
> by having discussions here (on design) and on JIRA (for processing
> minor issues) based on the current state of your repository.
> [What's the link to look it up?]
>
> Or should I
>> follow a specific method?
>>
>
> I'll inquire about a more efficient method (than the above)...
>
> By referring the API docs I got some idea of the separation of modules.
>>
>> In the current Commons's stat repo there are some classes under the
>> package  distribution. I think those can be refactored using java 8 in
>> build statistics functionalities. Please correct me if I wrong.
>>
>
> An example perhaps?
>
> As Eric said separation of function and streaming implementations is good
>> idea as designing. (In my point of view, it means method overloading ->
>> Again correct me if I didn't understand your fact correctly)
>>
>
> ?
>
> And I will share my draft proposal here for your review soon.
>>
>
> OK.
>
> Thanks again for your interest,
> Gilles
>
>
>
>> Best Regards.
>>
>> On 13 March 2018 at 20:50, Gilles  wrote:
>>
>> Hello.
>>>
>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>
>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles 
 wrote:



> Where can we find the old code before port into new Commons components?
>>
>>
>> The code bases are managed by the "git" software; the whole history is
> available:
>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>
> [I'd advise to "clone" the repositories on your local computer, and
> use the command line tools.]
>
>

 I believe you will want to clone the commons-math repositories, but then
 develop your own "fork" of the commons-statistics repository. Gilles can
 correct me if that is wrong.


>>> Actually, I know only my workflow:
>>>  $ git clone ...
>>>  $ git branch ...
>>>  $ git commit ...
>>>  $ git push
>>>
>>> :-}
>>>
>>> I didn't find it very easy to cooperate with developers who
>>> fork on GitHub and submit PRs.
>>> I've now found the "git" command that creates a branch from
>>> a PR, but it would be so much more comfortable to just switch
>>> directory and do "git pull".
>>>
>>> In the context of GSoC, would it be possible to grant some
>>> privilege to non-committers so that they can update a selected
>>> "git" repository?
>>> If not, what is the next easiest way to share a "common space"
>>> (aka "sandbox") from which it would be easy to copy reviewed
>>> bits over to the official source repository?
>>>
>>>
>>> As
>
> you mentioned it will be a good approach to redesign process.
>>
>>
>> You don't necessarily need to analyze how the code was before
> the port/refactoring; looking at how it is now is sufficient,
> unless you suspect that something is wrong now and might have
> been better before. ;-)
>
>
> In particular, the statistics library was designed before Java 8. Java
 8
 however has provided both efficient programming strategies for these
 statistical methods (in the form of lambdas and streams) as well as some
 built-in methods providing summary statistics functions (see discussion
 at
 http://markmail.org/message/7t2mjaprsuvb3waj).


>>> Very good point, indeed.
>>> IMO, the new component should be targeted Java 8.
>>> Even Java 9 (enforcing modularity with 

Re: [Statistics] Port codes from Commons Math

2018-03-14 Thread Gilles

Hi.

On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:

Hello Devs,

Thanks Gilles and Eric for guidance.

I have cloned the Commons repos and forked the Common's Stat repo. Is 
it

possible to make pull requests to that repo to be reviewed?


That's certainly possible, but I'm afraid that it will become
quite unwieldy from my side if I have to delete/create branches
for every PR.

If you want to start playing with the code, we could just begin
by having discussions here (on design) and on JIRA (for processing
minor issues) based on the current state of your repository.
[What's the link to look it up?]


Or should I
follow a specific method?


I'll inquire about a more efficient method (than the above)...

By referring the API docs I got some idea of the separation of 
modules.


In the current Commons's stat repo there are some classes under the
package  distribution. I think those can be refactored using java 8 
in

build statistics functionalities. Please correct me if I wrong.


An example perhaps?

As Eric said separation of function and streaming implementations is 
good
idea as designing. (In my point of view, it means method overloading 
->

Again correct me if I didn't understand your fact correctly)


?


And I will share my draft proposal here for your review soon.


OK.

Thanks again for your interest,
Gilles



Best Regards.

On 13 March 2018 at 20:50, Gilles  
wrote:



Hello.

On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:

On Tue, Mar 13, 2018 at 12:47 AM, Gilles 


wrote:




Where can we find the old code before port into new Commons 
components?



The code bases are managed by the "git" software; the whole 
history is

available:
  
https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log


[I'd advise to "clone" the repositories on your local computer, 
and

use the command line tools.]




I believe you will want to clone the commons-math repositories, but 
then
develop your own "fork" of the commons-statistics repository. 
Gilles can

correct me if that is wrong.



Actually, I know only my workflow:
 $ git clone ...
 $ git branch ...
 $ git commit ...
 $ git push

:-}

I didn't find it very easy to cooperate with developers who
fork on GitHub and submit PRs.
I've now found the "git" command that creates a branch from
a PR, but it would be so much more comfortable to just switch
directory and do "git pull".

In the context of GSoC, would it be possible to grant some
privilege to non-committers so that they can update a selected
"git" repository?
If not, what is the next easiest way to share a "common space"
(aka "sandbox") from which it would be easy to copy reviewed
bits over to the official source repository?



As


you mentioned it will be a good approach to redesign process.



You don't necessarily need to analyze how the code was before
the port/refactoring; looking at how it is now is sufficient,
unless you suspect that something is wrong now and might have
been better before. ;-)


In particular, the statistics library was designed before Java 8. 
Java 8
however has provided both efficient programming strategies for 
these
statistical methods (in the form of lambdas and streams) as well as 
some
built-in methods providing summary statistics functions (see 
discussion at

http://markmail.org/message/7t2mjaprsuvb3waj).



Very good point, indeed.
IMO, the new component should be targeted Java 8.
Even Java 9 (enforcing modularity with JPMS): if by the time we 
think
of releasing the code, we still want to avoid "multi-release" JARs 
it
will be easy to just remove the "module-info" files (I don't think 
much

else Java 9 specific would used by "Commons Statistics").

In fact, given the very slow pace at which new components are being
brought to releasable state, I'd like to ask whether it would be OK
to make "incremental" releases?  That would mean: focus on (maven)
modules that seem close to feature-complete and bug-free, fix the
remaining issues and perform a release with that module added.

It seems that the expectations were set to high (content-wise given
the amount of human resources), so that neither CM can be released
(too many non-fixed issues) nor its "Commons Numbers" spin-off that
contains many modules, some of which are blocked by lack of 
consensus

or dangling discussions.

It probably makes sense, as a design strategy, to separate the 
function

implementation from the streaming implementation. For example, a 2D
integer
array will probably require a different streaming implementation 
than a 1D
double array, but they can  probably both be passed the same 
function

handle to collect, say, the mean or max value.

The role of commons might then be to provide a convenient 
interface, so
that the user can simply call a static method like 
SummaryStats.mean() and

not have to worry about the implementation.

The other difficulty I see, is that quantile and median statistics 

Re: [Statistics] Port codes from Commons Math

2018-03-13 Thread Gimhana Nadeeshan
Hello Devs,

Thanks Gilles and Eric for guidance.

I have cloned the Commons repos and forked the Common's Stat repo. Is it
possible to make pull requests to that repo to be reviewed? Or should I
follow a specific method?

By referring the API docs I got some idea of the separation of modules.

In the current Commons's stat repo there are some classes under the
package  distribution. I think those can be refactored using java 8 in
build statistics functionalities. Please correct me if I wrong.

As Eric said separation of function and streaming implementations is good
idea as designing. (In my point of view, it means method overloading ->
Again correct me if I didn't understand your fact correctly)

And I will share my draft proposal here for your review soon.

Best Regards.

On 13 March 2018 at 20:50, Gilles  wrote:

> Hello.
>
> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>
>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles 
>> wrote:
>>
>>
>>>
 Where can we find the old code before port into new Commons components?


>>> The code bases are managed by the "git" software; the whole history is
>>> available:
>>>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>>>
>>> [I'd advise to "clone" the repositories on your local computer, and
>>> use the command line tools.]
>>>
>>
>>
>> I believe you will want to clone the commons-math repositories, but then
>> develop your own "fork" of the commons-statistics repository. Gilles can
>> correct me if that is wrong.
>>
>
> Actually, I know only my workflow:
>  $ git clone ...
>  $ git branch ...
>  $ git commit ...
>  $ git push
>
> :-}
>
> I didn't find it very easy to cooperate with developers who
> fork on GitHub and submit PRs.
> I've now found the "git" command that creates a branch from
> a PR, but it would be so much more comfortable to just switch
> directory and do "git pull".
>
> In the context of GSoC, would it be possible to grant some
> privilege to non-committers so that they can update a selected
> "git" repository?
> If not, what is the next easiest way to share a "common space"
> (aka "sandbox") from which it would be easy to copy reviewed
> bits over to the official source repository?
>
>
>>> As
>>>
 you mentioned it will be a good approach to redesign process.


>>> You don't necessarily need to analyze how the code was before
>>> the port/refactoring; looking at how it is now is sufficient,
>>> unless you suspect that something is wrong now and might have
>>> been better before. ;-)
>>>
>>>
>> In particular, the statistics library was designed before Java 8. Java 8
>> however has provided both efficient programming strategies for these
>> statistical methods (in the form of lambdas and streams) as well as some
>> built-in methods providing summary statistics functions (see discussion at
>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>
>
> Very good point, indeed.
> IMO, the new component should be targeted Java 8.
> Even Java 9 (enforcing modularity with JPMS): if by the time we think
> of releasing the code, we still want to avoid "multi-release" JARs it
> will be easy to just remove the "module-info" files (I don't think much
> else Java 9 specific would used by "Commons Statistics").
>
> In fact, given the very slow pace at which new components are being
> brought to releasable state, I'd like to ask whether it would be OK
> to make "incremental" releases?  That would mean: focus on (maven)
> modules that seem close to feature-complete and bug-free, fix the
> remaining issues and perform a release with that module added.
>
> It seems that the expectations were set to high (content-wise given
> the amount of human resources), so that neither CM can be released
> (too many non-fixed issues) nor its "Commons Numbers" spin-off that
> contains many modules, some of which are blocked by lack of consensus
> or dangling discussions.
>
> It probably makes sense, as a design strategy, to separate the function
>> implementation from the streaming implementation. For example, a 2D
>> integer
>> array will probably require a different streaming implementation than a 1D
>> double array, but they can  probably both be passed the same function
>> handle to collect, say, the mean or max value.
>>
>> The role of commons might then be to provide a convenient interface, so
>> that the user can simply call a static method like SummaryStats.mean() and
>> not have to worry about the implementation.
>>
>> The other difficulty I see, is that quantile and median statistics will
>> not
>> be as easy to stream as statistics with a closed-form solution like mean
>> or
>> variance. There may however be great algorithms out there for pulling the
>> median or the 95% quantile out of a stream -- if so they should be used.
>>
>> Eric
>>
>
> Eric,
>
> Would you be the official "mentor" for the GSoC participants that
> are interested in helping with the porting of 

Re: [Statistics] Port codes from Commons Math

2018-03-13 Thread Gilles

Hello.

On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
On Tue, Mar 13, 2018 at 12:47 AM, Gilles 


wrote:





Where can we find the old code before port into new Commons 
components?




The code bases are managed by the "git" software; the whole history 
is

available:
  https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log

[I'd advise to "clone" the repositories on your local computer, and
use the command line tools.]



I believe you will want to clone the commons-math repositories, but 
then
develop your own "fork" of the commons-statistics repository. Gilles 
can

correct me if that is wrong.


Actually, I know only my workflow:
 $ git clone ...
 $ git branch ...
 $ git commit ...
 $ git push

:-}

I didn't find it very easy to cooperate with developers who
fork on GitHub and submit PRs.
I've now found the "git" command that creates a branch from
a PR, but it would be so much more comfortable to just switch
directory and do "git pull".

In the context of GSoC, would it be possible to grant some
privilege to non-committers so that they can update a selected
"git" repository?
If not, what is the next easiest way to share a "common space"
(aka "sandbox") from which it would be easy to copy reviewed
bits over to the official source repository?



As

you mentioned it will be a good approach to redesign process.



You don't necessarily need to analyze how the code was before
the port/refactoring; looking at how it is now is sufficient,
unless you suspect that something is wrong now and might have
been better before. ;-)



In particular, the statistics library was designed before Java 8. 
Java 8

however has provided both efficient programming strategies for these
statistical methods (in the form of lambdas and streams) as well as 
some
built-in methods providing summary statistics functions (see 
discussion at

http://markmail.org/message/7t2mjaprsuvb3waj).


Very good point, indeed.
IMO, the new component should be targeted Java 8.
Even Java 9 (enforcing modularity with JPMS): if by the time we think
of releasing the code, we still want to avoid "multi-release" JARs it
will be easy to just remove the "module-info" files (I don't think much
else Java 9 specific would used by "Commons Statistics").

In fact, given the very slow pace at which new components are being
brought to releasable state, I'd like to ask whether it would be OK
to make "incremental" releases?  That would mean: focus on (maven)
modules that seem close to feature-complete and bug-free, fix the
remaining issues and perform a release with that module added.

It seems that the expectations were set to high (content-wise given
the amount of human resources), so that neither CM can be released
(too many non-fixed issues) nor its "Commons Numbers" spin-off that
contains many modules, some of which are blocked by lack of consensus
or dangling discussions.

It probably makes sense, as a design strategy, to separate the 
function
implementation from the streaming implementation. For example, a 2D 
integer
array will probably require a different streaming implementation than 
a 1D

double array, but they can  probably both be passed the same function
handle to collect, say, the mean or max value.

The role of commons might then be to provide a convenient interface, 
so
that the user can simply call a static method like 
SummaryStats.mean() and

not have to worry about the implementation.

The other difficulty I see, is that quantile and median statistics 
will not
be as easy to stream as statistics with a closed-form solution like 
mean or
variance. There may however be great algorithms out there for pulling 
the
median or the 95% quantile out of a stream -- if so they should be 
used.


Eric


Eric,

Would you be the official "mentor" for the GSoC participants that
are interested in helping with the porting of "o.a.c.math4.stat"?

Thank you,
Gilles


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Statistics]Port codes from Commons Math

2018-03-13 Thread Eric Barnhill
On Tue, Mar 13, 2018 at 12:47 AM, Gilles 
wrote:

>
>>
>> Where can we find the old code before port into new Commons components?
>>
>
> The code bases are managed by the "git" software; the whole history is
> available:
>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>
> [I'd advise to "clone" the repositories on your local computer, and
> use the command line tools.]


I believe you will want to clone the commons-math repositories, but then
develop your own "fork" of the commons-statistics repository. Gilles can
correct me if that is wrong.


>
>
> As
>> you mentioned it will be a good approach to redesign process.
>>
>
> You don't necessarily need to analyze how the code was before
> the port/refactoring; looking at how it is now is sufficient,
> unless you suspect that something is wrong now and might have
> been better before. ;-)
>

In particular, the statistics library was designed before Java 8. Java 8
however has provided both efficient programming strategies for these
statistical methods (in the form of lambdas and streams) as well as some
built-in methods providing summary statistics functions (see discussion at
http://markmail.org/message/7t2mjaprsuvb3waj).

It probably makes sense, as a design strategy, to separate the function
implementation from the streaming implementation. For example, a 2D integer
array will probably require a different streaming implementation than a 1D
double array, but they can  probably both be passed the same function
handle to collect, say, the mean or max value.

The role of commons might then be to provide a convenient interface, so
that the user can simply call a static method like SummaryStats.mean() and
not have to worry about the implementation.

The other difficulty I see, is that quantile and median statistics will not
be as easy to stream as statistics with a closed-form solution like mean or
variance. There may however be great algorithms out there for pulling the
median or the 95% quantile out of a stream -- if so they should be used.

Eric


Re: [Statistics]Port codes from Commons Math

2018-03-12 Thread Gilles

On Mon, 12 Mar 2018 08:43:29 +0530, Gimhana Nadeeshan wrote:

Hi devs,

Thanks Gilles for the ideas.

Now I have an idea what to do. I go through the codes in

https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/main/java/org/apache/commons/math4/stat

And I could identify the coupling hierarchy at the top level. So I 
would

like to get a start from Confidence Interval


. It seems a minor dependencies in the class it self.
How can I begin contributing? Could you please share the repo Links
corresponding to CM Statistics.

Where can we find the old code before port into new Commons 
components?


The code bases are managed by the "git" software; the whole history is
available:
  https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log

[I'd advise to "clone" the repositories on your local computer, and
use the command line tools.]


As
you mentioned it will be a good approach to redesign process.


You don't necessarily need to analyze how the code was before
the port/refactoring; looking at how it is now is sufficient,
unless you suspect that something is wrong now and might have
been better before. ;-)

Don't hesitate to post here with your suggestions.


To get a good
comparison, links of CM's Random Package code repo and current CM RNG
component.


I think that you can already get a pretty good idea of the evolution
by comparing the generated "apidocs":
  
http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/random/package-summary.html

vs
  
http://commons.apache.org/proper/commons-rng/commons-rng-client-api/javadocs/api-1.0/index.html
  
http://commons.apache.org/proper/commons-rng/commons-rng-core/javadocs/api-1.0/index.html
  
http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.0/index.html


Regards,
Gilles


Regards,
Gimhana


On 11 March 2018 at 17:07, Gilles  
wrote:



Hello.

On Sun, 11 Mar 2018 08:30:02 +0530, Gimhana Nadeeshan wrote:


Hi devs,

I am an 3rd year Computer Science and Engineering undergraduate of
University of Moratuwa and I am interested in mathematics so much. 
So I

would like to work on porting codes from Commons Math to Commons
Statistics
component as my GSOC 2018 project.



Welcome!

So How to get a head start on this problem?




The big picture is that the "Commons Math" code must be
split into either new components (as "sub-projects" are
called within the "Apache Commons" project), or maven module
within "Commons Math" (CM).
Which of the alternatives depends on whether a "scope" (or
"subject matter") can be clearly identified, and whether a
fairly broad usefulness can be assumed.

Practically, you can see how the premise turned out for
functionalities there were/are already in the porting
process:
 * CM's "random" package -> component "Commons RNG"[1]
 * CM's "complex", "fraction", "util", "primes", "special"
   packages -> modules in component "Commons Numbers"[2]
 * CM's "distribution" package -> "distribution" module
   in "Commons Statistics"[3]

At least one other CM package would make an obvious new
component: "geometry".

What should I port first




The goal is modularization (for easier usage, maintenance,
and development).
The modules must not have circular dependencies.  Hence
the first step is to identify dependencies and define
the "boundaries" of purported modules.

The easiest is of course to define modules that have
zero dependencies.
Then, modules that depend on those.
And so on, up the hierarchy.

In practice, each ported functionality usually becomes
a dependency of CM (whose unit test suites should still
pass when they use the ported code).

Dependency on other "Commons" components is allowed;
runtime dependency on external libraries other than
the JDK is not.

and

how to redesign it?



I'm afraid there is no single answer.

Personally, I don't have a clear idea of what should
be the grand vision.  Do you have suggestions?
It would certainly be helpful to have a summary of the
design principles used in other (OO) libraries.
Guidelines could also perhaps be deduced from reported
bugs, some of which are mentioned in the page of the
GSoC report.[4]

I hope that other people reading this will chime in and
help draw a concrete plan.

Best regards,
Gilles

Best Regards,

Gimhana.



[1] https://commons.apache.org/rng
[2] https://commons.apache.org/numbers
[3] https://commons.apache.org/proper/commons-statistics/
[4] https://issues.apache.org/jira/browse/STATISTICS-5




-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Statistics]Port codes from Commons Math

2018-03-11 Thread Gimhana Nadeeshan
Hi devs,

Thanks Gilles for the ideas.

Now I have an idea what to do. I go through the codes in
https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/main/java/org/apache/commons/math4/stat

And I could identify the coupling hierarchy at the top level. So I would
like to get a start from Confidence Interval

. It seems a minor dependencies in the class it self.
How can I begin contributing? Could you please share the repo Links
corresponding to CM Statistics.

Where can we find the old code before port into new Commons components? As
you mentioned it will be a good approach to redesign process. To get a good
comparison, links of CM's Random Package code repo and current CM RNG
component.

Regards,
Gimhana


On 11 March 2018 at 17:07, Gilles  wrote:

> Hello.
>
> On Sun, 11 Mar 2018 08:30:02 +0530, Gimhana Nadeeshan wrote:
>
>> Hi devs,
>>
>> I am an 3rd year Computer Science and Engineering undergraduate of
>> University of Moratuwa and I am interested in mathematics so much. So I
>> would like to work on porting codes from Commons Math to Commons
>> Statistics
>> component as my GSOC 2018 project.
>>
>
> Welcome!
>
> So How to get a head start on this problem?
>>
>
> The big picture is that the "Commons Math" code must be
> split into either new components (as "sub-projects" are
> called within the "Apache Commons" project), or maven module
> within "Commons Math" (CM).
> Which of the alternatives depends on whether a "scope" (or
> "subject matter") can be clearly identified, and whether a
> fairly broad usefulness can be assumed.
>
> Practically, you can see how the premise turned out for
> functionalities there were/are already in the porting
> process:
>  * CM's "random" package -> component "Commons RNG"[1]
>  * CM's "complex", "fraction", "util", "primes", "special"
>packages -> modules in component "Commons Numbers"[2]
>  * CM's "distribution" package -> "distribution" module
>in "Commons Statistics"[3]
>
> At least one other CM package would make an obvious new
> component: "geometry".
>
> What should I port first
>>
>
> The goal is modularization (for easier usage, maintenance,
> and development).
> The modules must not have circular dependencies.  Hence
> the first step is to identify dependencies and define
> the "boundaries" of purported modules.
>
> The easiest is of course to define modules that have
> zero dependencies.
> Then, modules that depend on those.
> And so on, up the hierarchy.
>
> In practice, each ported functionality usually becomes
> a dependency of CM (whose unit test suites should still
> pass when they use the ported code).
>
> Dependency on other "Commons" components is allowed;
> runtime dependency on external libraries other than
> the JDK is not.
>
> and
>> how to redesign it?
>>
>
> I'm afraid there is no single answer.
>
> Personally, I don't have a clear idea of what should
> be the grand vision.  Do you have suggestions?
> It would certainly be helpful to have a summary of the
> design principles used in other (OO) libraries.
> Guidelines could also perhaps be deduced from reported
> bugs, some of which are mentioned in the page of the
> GSoC report.[4]
>
> I hope that other people reading this will chime in and
> help draw a concrete plan.
>
> Best regards,
> Gilles
>
> Best Regards,
>> Gimhana.
>>
>
> [1] https://commons.apache.org/rng
> [2] https://commons.apache.org/numbers
> [3] https://commons.apache.org/proper/commons-statistics/
> [4] https://issues.apache.org/jira/browse/STATISTICS-5
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


-- 

Nadeeshan Gimhana

Batch Representative (15' batch)

Department of Computer Science & Engineering

University of Moratuwa

*Mobile :+94775744613*


*Website : https://ngimhana94.wixsite.com/gimhanadesilva/
*

*L**inkedin **:www.linkedin.com/in/nadeeshangimhana/
*


* *


* *


Re: [Statistics]Port codes from Commons Math

2018-03-11 Thread Gilles

Hello.

On Sun, 11 Mar 2018 08:30:02 +0530, Gimhana Nadeeshan wrote:

Hi devs,

I am an 3rd year Computer Science and Engineering undergraduate of
University of Moratuwa and I am interested in mathematics so much. So 
I
would like to work on porting codes from Commons Math to Commons 
Statistics

component as my GSOC 2018 project.


Welcome!


So How to get a head start on this problem?


The big picture is that the "Commons Math" code must be
split into either new components (as "sub-projects" are
called within the "Apache Commons" project), or maven module
within "Commons Math" (CM).
Which of the alternatives depends on whether a "scope" (or
"subject matter") can be clearly identified, and whether a
fairly broad usefulness can be assumed.

Practically, you can see how the premise turned out for
functionalities there were/are already in the porting
process:
 * CM's "random" package -> component "Commons RNG"[1]
 * CM's "complex", "fraction", "util", "primes", "special"
   packages -> modules in component "Commons Numbers"[2]
 * CM's "distribution" package -> "distribution" module
   in "Commons Statistics"[3]

At least one other CM package would make an obvious new
component: "geometry".


What should I port first


The goal is modularization (for easier usage, maintenance,
and development).
The modules must not have circular dependencies.  Hence
the first step is to identify dependencies and define
the "boundaries" of purported modules.

The easiest is of course to define modules that have
zero dependencies.
Then, modules that depend on those.
And so on, up the hierarchy.

In practice, each ported functionality usually becomes
a dependency of CM (whose unit test suites should still
pass when they use the ported code).

Dependency on other "Commons" components is allowed;
runtime dependency on external libraries other than
the JDK is not.


and
how to redesign it?


I'm afraid there is no single answer.

Personally, I don't have a clear idea of what should
be the grand vision.  Do you have suggestions?
It would certainly be helpful to have a summary of the
design principles used in other (OO) libraries.
Guidelines could also perhaps be deduced from reported
bugs, some of which are mentioned in the page of the
GSoC report.[4]

I hope that other people reading this will chime in and
help draw a concrete plan.

Best regards,
Gilles


Best Regards,
Gimhana.


[1] https://commons.apache.org/rng
[2] https://commons.apache.org/numbers
[3] https://commons.apache.org/proper/commons-statistics/
[4] https://issues.apache.org/jira/browse/STATISTICS-5


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org