from:"Gilles"

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

2020-03-21 Thread Gilles Sadowski

Hello.

2020-03-21 4:02 UTC+01:00, chentao...@qq.com :
> Hi,
>
>>2020-03-21 2:59 UTC+01:00, chentao...@qq.com :
>>> Hi,
>>>
>>>>Le ven. 20 mars 2020 à 04:47, chentao...@qq.com  a
>>>> écrit
>>>> :
>>>>>
>>>>> Hi,
>>>>>
>>>>> >Hello.
>>>>> >
>>>>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski  a
>>>>> > écrit :
>>>>> >>
>>>>> >> Hi.
>>>>> >>
>>>>> >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com :
>>>>> >> > Hi,
>>>>> >> > I have created a PR to show my aim:
>>>>> >> > https://github.com/apache/commons-math/pull/126/files
>>>>> >>
>>>>> >> Am I correct that the implementations of "ClustersPointExtractor"
>>>>> >> modify the argument of the "extract" method?
>>>>> >> If so, that seems quite unsafe.  I would not expect this behaviour
>>>>> >> in a public API.
>>>>> >
>>>>> >To be clearer (hopefully): I'd contrast this
>>>>> > (incomplete/non-compilable
>>>>> >code):
>>>>> >
>>>>> >public class Extractor {
>>>>> >T extract(List> list);
>>>>> >}
>>>>>
>>>>> I have read the exists code again, and I found "EmptyClusterStrategy"
>>>>>  and related logic only used in K-Means.
>>>>> And the "Extractor" remove a point from exists Clusters is indeed
>>>>> unsuitable
>>>>> to be a public API(as you mentioned before.)
>>>>> I think another choice is simple make "EmptyClusterStrategy" and
>>>>> related
>>>>> logic protected,
>>>>> then it can be resuse by MiniBatchKMeans.
>>>>>
>>>>> Also after the "CentroidInitializer" has been move out, the
>>>>> "KMeansPlusPlusClusterer"
>>>>>  should rename to "KMeansClusterer" with a construct parameter
>>>>> "CentroidInitializer"
>>>>
>>>>If you think that it makes sense to define a class hierarchy for
>>>>"KMeans"-based algorithms, please do so.  But note that "protected"
>>>>is only slightly better than "public" (from a maintenance POV, it isn't
>>>>better, since we'll be required to maintain compatibility all the same).
>>>>Perhaps the base classe(s) can be package-private...
>>>
>>> OK, package-private on "EmptyClusterStrategy" and related logic is fine.
>>>
>>>>
>>>>> >
>>>>> >with something along those lines:
>>>>> >
>>>>> >public class ClusterSet {
>>>>> >private Set data;
>>>>> >private List> clusters;
>>>>> >
>>>>> >void remove(T e) {
>>>>> >return data.remove(e);
>>>>> >}
>>>>> >
>>>>> >public interface Visitor {
>>>>> >T visit(List> list);
>>>>> >}
>>>>> >}
>>>>> >
>>>>> >Key point is separation of concern (selection of element vs
>>>>> >removal of element).
>>>>>
>>>>> I propose the ClusterSet should has more member like "distanceMeasure"
>>>>>  to store origin clustering parameters,
>>>>> or  to accelerate or support cluster finding or point finding.
>>>>> ```java
>>>>> public class ClusterSet {
>>>>> private List clusters;
>>>>> private DistanceMeasure measure;
>>>>> ...
>>>>> public List closestTopN(Clusterable point);
>>>>
>>>>How about providing a comparator factory?
>>>>
>>>>public  static
>>>> Comparator> getComparator(final Clusterable p, final
>>>>DistanceMeasure m) {
>>>>return new Comparator<>() {
>>>>public int compare(Cluster c1, Cluster c2) {
>>>>final double d1 = c1.distance(p, m); // "distance" method
>>>>to be added to the interface (?).
>>>>final double d2 = c2.distance(p, m);

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

2020-03-20 Thread Gilles Sadowski

2020-03-21 2:59 UTC+01:00, chentao...@qq.com :
> Hi,
>
>>Le ven. 20 mars 2020 à 04:47, chentao...@qq.com  a écrit
>> :
>>>
>>> Hi,
>>>
>>> >Hello.
>>> >
>>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski  a
>>> > écrit :
>>> >>
>>> >> Hi.
>>> >>
>>> >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com :
>>> >> > Hi,
>>> >> > I have created a PR to show my aim:
>>> >> > https://github.com/apache/commons-math/pull/126/files
>>> >>
>>> >> Am I correct that the implementations of "ClustersPointExtractor"
>>> >> modify the argument of the "extract" method?
>>> >> If so, that seems quite unsafe.  I would not expect this behaviour
>>> >> in a public API.
>>> >
>>> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
>>> >code):
>>> >
>>> >public class Extractor {
>>> >T extract(List> list);
>>> >}
>>>
>>> I have read the exists code again, and I found "EmptyClusterStrategy"
>>>  and related logic only used in K-Means.
>>> And the "Extractor" remove a point from exists Clusters is indeed
>>> unsuitable
>>> to be a public API(as you mentioned before.)
>>> I think another choice is simple make "EmptyClusterStrategy" and related
>>> logic protected,
>>> then it can be resuse by MiniBatchKMeans.
>>>
>>> Also after the "CentroidInitializer" has been move out, the
>>> "KMeansPlusPlusClusterer"
>>>  should rename to "KMeansClusterer" with a construct parameter
>>> "CentroidInitializer"
>>
>>If you think that it makes sense to define a class hierarchy for
>>"KMeans"-based algorithms, please do so.  But note that "protected"
>>is only slightly better than "public" (from a maintenance POV, it isn't
>>better, since we'll be required to maintain compatibility all the same).
>>Perhaps the base classe(s) can be package-private...
>
> OK, package-private on "EmptyClusterStrategy" and related logic is fine.
>
>>
>>> >
>>> >with something along those lines:
>>> >
>>> >public class ClusterSet {
>>> >private Set data;
>>> >private List> clusters;
>>> >
>>> >void remove(T e) {
>>> >return data.remove(e);
>>> >}
>>> >
>>> >public interface Visitor {
>>> >T visit(List> list);
>>> >}
>>> >}
>>> >
>>> >Key point is separation of concern (selection of element vs
>>> >removal of element).
>>>
>>> I propose the ClusterSet should has more member like "distanceMeasure"
>>>  to store origin clustering parameters,
>>> or  to accelerate or support cluster finding or point finding.
>>> ```java
>>> public class ClusterSet {
>>> private List clusters;
>>> private DistanceMeasure measure;
>>> ...
>>> public List closestTopN(Clusterable point);
>>
>>How about providing a comparator factory?
>>
>>public  static
>> Comparator> getComparator(final Clusterable p, final
>>DistanceMeasure m) {
>>return new Comparator<>() {
>>public int compare(Cluster c1, Cluster c2) {
>>final double d1 = c1.distance(p, m); // "distance" method
>>to be added to the interface (?).
>>final double d2 = c2.distance(p, m);
>>return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
>>}
>>}
>>}
>
> How do you think about the "distanceMeasure" to be a memeber of
> "ClusterSet",

I don't see why.  A priori "ClusterSet" represents groupings
of data, and the distance function is a parameter that e.g.
determines whether the grouping is good or bad.

> In my opinion "ClusterSet" should contains all the parameters and result for
> clustering and clusters.

IIRC, Alex mentioned "ClusterResult" (?).  IIUC, this would
indeed collect everything necessary to evaluate the quality
of the clustering (i.e. probably also the "DistanceMeasure"
instance used by the clustering algorithm).

>>
>>Then, if also providing an accessor to the clusters:
>>
>>public List> getClusters()

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

2020-03-20 Thread Gilles Sadowski

Le ven. 20 mars 2020 à 04:47, chentao...@qq.com  a écrit :
>
> Hi,
>
> >Hello.
> >
> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski  a écrit 
> >:
> >>
> >> Hi.
> >>
> >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com :
> >> > Hi,
> >> > I have created a PR to show my aim:
> >> > https://github.com/apache/commons-math/pull/126/files
> >>
> >> Am I correct that the implementations of "ClustersPointExtractor"
> >> modify the argument of the "extract" method?
> >> If so, that seems quite unsafe.  I would not expect this behaviour
> >> in a public API.
> >
> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
> >code):
> >
> >public class Extractor {
> >T extract(List> list);
> >}
>
> I have read the exists code again, and I found "EmptyClusterStrategy"
>  and related logic only used in K-Means.
> And the "Extractor" remove a point from exists Clusters is indeed unsuitable
> to be a public API(as you mentioned before.)
> I think another choice is simple make "EmptyClusterStrategy" and related 
> logic protected,
> then it can be resuse by MiniBatchKMeans.
>
> Also after the "CentroidInitializer" has been move out, the 
> "KMeansPlusPlusClusterer"
>  should rename to "KMeansClusterer" with a construct parameter 
> "CentroidInitializer"

If you think that it makes sense to define a class hierarchy for
"KMeans"-based algorithms, please do so.  But note that "protected"
is only slightly better than "public" (from a maintenance POV, it isn't
better, since we'll be required to maintain compatibility all the same).
Perhaps the base classe(s) can be package-private...

> >
> >with something along those lines:
> >
> >public class ClusterSet {
> >private Set data;
> >private List> clusters;
> >
> >void remove(T e) {
> >return data.remove(e);
> >}
> >
> >public interface Visitor {
> >T visit(List> list);
> >}
> >}
> >
> >Key point is separation of concern (selection of element vs
> >removal of element).
>
> I propose the ClusterSet should has more member like "distanceMeasure"
>  to store origin clustering parameters,
> or  to accelerate or support cluster finding or point finding.
> ```java
> public class ClusterSet {
> private List clusters;
> private DistanceMeasure measure;
> ...
> public List closestTopN(Clusterable point);

How about providing a comparator factory?

public  static
 Comparator> getComparator(final Clusterable p, final
DistanceMeasure m) {
return new Comparator<>() {
public int compare(Cluster c1, Cluster c2) {
final double d1 = c1.distance(p, m); // "distance" method
to be added to the interface (?).
final double d2 = c2.distance(p, m);
return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
}
}
}

Then, if also providing an accessor to the clusters:

public List> getClusters() {
return Collections.umodifiableList(clusters);
}

users can do whatever they want, a.o. easily implement the
ranking method which you suggest:

/** @return clusters in descending order of proximity. */
public List closest(Clusterable p, ClusterSet s,
DistanceMeasure m) {
final List> cList = new ArrayList<>(s.getClusters());
Collections.sort(cList, set.getComparator(p, m));
return cList;
}

>
> public int pointSize();
> }
> ```

A priori, I'd favour more encapsulation, in order to ensure that
multi-thread access can be controlled, e.g. hide the "ClusterSet"
data-structures, and provide access through identifiers that are
atomically created when clusters are added to the instance, and
so on...

It would be great if you can come up with a concrete (and fully
tested) implementation of "ClusterSet". ;-)

Best,
Gilles

> >
> >Of course the devil is in the details (maintain consistency among
> >the fields of "ClusterSet", ensure that "Visitor" implementations
> >cannot modify the internal state, ...).
> >
> >> Unless I missed some point, I'd ask again that the API be reviewed
> >> *before* implementing several features (such as those "extractors")
> >> on top of something that does not look right.
> >
> >One perspective might be to try and come with a design for use in
> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
> >
> >Best regards,
> >Gilles
> >
> >> >> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

2020-03-19 Thread Gilles Sadowski

Hello.

Le mer. 18 mars 2020 à 17:57, Gilles Sadowski  a écrit :
>
> Hi.
>
> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com :
> > Hi,
> > I have created a PR to show my aim:
> > https://github.com/apache/commons-math/pull/126/files
>
> Am I correct that the implementations of "ClustersPointExtractor"
> modify the argument of the "extract" method?
> If so, that seems quite unsafe.  I would not expect this behaviour
> in a public API.

To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
code):

public class Extractor {
T extract(List> list);
}

with something along those lines:

public class ClusterSet {
private Set data;
private List> clusters;

void remove(T e) {
return data.remove(e);
}

public interface Visitor {
T visit(List> list);
}
}

Key point is separation of concern (selection of element vs
removal of element).

Of course the devil is in the details (maintain consistency among
the fields of "ClusterSet", ensure that "Visitor" implementations
cannot modify the internal state, ...).

> Unless I missed some point, I'd ask again that the API be reviewed
> *before* implementing several features (such as those "extractors")
> on top of something that does not look right.

One perspective might be to try and come with a design for use in
multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").

Best regards,
Gilles

> >> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

2020-03-18 Thread Gilles Sadowski

Hi.

2020-03-18 15:10 UTC+01:00, chentao...@qq.com :
> Hi,
> I have created a PR to show my aim:
> https://github.com/apache/commons-math/pull/126/files

Am I correct that the implementations of "ClustersPointExtractor"
modify the argument of the "extract" method?
If so, that seems quite unsafe.  I would not expect this behaviour
in a public API.

Unless I missed some point, I'd ask again that the API be reviewed
*before* implementing several features (such as those "extractors")
on top of something that does not look right.

Best regards,
Gilles

>
>>Hello.
>>
>>Le mer. 11 mars 2020 à 07:28, chentao...@qq.com  a écrit
>> :
>>>
>>> Hi all,
>>> The "EmptyClusterStrategy" in KMeansPlusPlusClusterer can be reused
>>> MiniBatchKMeansClusterer and other cluster altorithm.
>>> So I think the "EmptyClusterStrategy" should move out from
>>> KMeansPlusPlusClusterer(JIRA issue #MATH-1525).
>>> I am not sure if my design is good or not.
>>
>>I can't say either; please provide more context/explanation
>>about the excerpts below.
>>
>>> I think here should be a interface:
>>>
>>> Solution 1: Explicit indicate the usage by class name and function name.
>>> ```java
>>> @FunctionalInterface
>>> public interface ClusterBreeder {
>>>  T newCenterPoint((final
>>> Collection> clusters);
>>> }
>>
>>What is a "Breeder"?
>>This seems to further complicates the matter; what is a "center" (if there
>>can be old and new ones).
>
> I mean a method to create a new Cluster from exists clusters.
>
>>
>>Regards,
>>Gilles
>>
>>> ...
>>> // Implementations
>>> public LargestVarianceClusterPointBreeder implements ClusterBreeder{...}
>>> public MostPopularClusterPointBreeder implements ClusterBreeder{...}
>>> public FarthestPointBreeder implements ClusterBreeder{...}
>>> ...
>>> // Usage
>>> // KMeansPlusPlusClusterer.java
>>> public class KMeansPlusPlusClusterer extends
>>> Clusterer {
>>> ...
>>> private final ClusterBreeder clusterBreeder;
>>> public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>>>final DistanceMeasure measure,
>>>final UniformRandomProvider random,
>>>final ClusterBreeder clusterBreeder) {
>>> ...
>>> this.clusterBreeder=clusterBreeder;
>>> }
>>> ...
>>> public List> cluster(final Collection points) {
>>> ...
>>> if (cluster.getPoints().isEmpty()) {
>>> if (clusterBreeder == null) {
>>> throw new
>>> ConvergenceException(LocalizedFormats.EMPTY_CLUSTER_IN_K_MEANS);
>>> } else {
>>> newCenter = clusterBreeder.newCenterPoint(clusters);
>>> }
>>> }
>>> ...
>>> }
>>> }
>>> ```
>>>
>>> Solution2: Declare a more generic interface:
>>> ```java
>>> @FunctionalInterface
>>> public interface ClustersPointFinder {
>>>  T find((final Collection>> extends Clusterable>> clusters);
>>> }
>>>
>>> ...
>>> // Implementations
>>> public LargestVarianceClusterPointFinder implements ClustersPointFinder
>>> {...}
>>> public MostPopularClusterPointFinder implements ClustersPointFinder {...}
>>> public FarthestPointFinder implements ClustersPointFinder {...}
>>> ```
>>>
>>> Thanks,
>>> -CT

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math]Discussion: How to move out "EmptyClusterStrategy" from KMeansPlusPlusClusterer

2020-03-11 Thread Gilles Sadowski

Hello.

Le mer. 11 mars 2020 à 07:28, chentao...@qq.com  a écrit :
>
> Hi all,
> The "EmptyClusterStrategy" in KMeansPlusPlusClusterer can be reused 
> MiniBatchKMeansClusterer and other cluster altorithm.
> So I think the "EmptyClusterStrategy" should move out from 
> KMeansPlusPlusClusterer(JIRA issue #MATH-1525).
> I am not sure if my design is good or not.

I can't say either; please provide more context/explanation
about the excerpts below.

> I think here should be a interface:
>
> Solution 1: Explicit indicate the usage by class name and function name.
> ```java
> @FunctionalInterface
> public interface ClusterBreeder {
>  T newCenterPoint((final 
> Collection> clusters);
> }

What is a "Breeder"?
This seems to further complicates the matter; what is a "center" (if there
can be old and new ones).

Regards,
Gilles

> ...
> // Implementations
> public LargestVarianceClusterPointBreeder implements ClusterBreeder{...}
> public MostPopularClusterPointBreeder implements ClusterBreeder{...}
> public FarthestPointBreeder implements ClusterBreeder{...}
> ...
> // Usage
> // KMeansPlusPlusClusterer.java
> public class KMeansPlusPlusClusterer extends 
> Clusterer {
> ...
> private final ClusterBreeder clusterBreeder;
> public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>final DistanceMeasure measure,
>final UniformRandomProvider random,
>final ClusterBreeder clusterBreeder) {
> ...
> this.clusterBreeder=clusterBreeder;
> }
> ...
> public List> cluster(final Collection points) {
> ...
> if (cluster.getPoints().isEmpty()) {
> if (clusterBreeder == null) {
> throw new 
> ConvergenceException(LocalizedFormats.EMPTY_CLUSTER_IN_K_MEANS);
> } else {
> newCenter = clusterBreeder.newCenterPoint(clusters);
> }
> }
> ...
> }
> }
> ```
>
> Solution2: Declare a more generic interface:
> ```java
> @FunctionalInterface
> public interface ClustersPointFinder {
>  T find((final Collection extends Clusterable>> clusters);
> }
>
> ...
> // Implementations
> public LargestVarianceClusterPointFinder implements ClustersPointFinder {...}
> public MostPopularClusterPointFinder implements ClustersPointFinder {...}
> public FarthestPointFinder implements ClustersPointFinder {...}
> ```
>
> Thanks,
> -CT

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math]Discuss: There should be a CalinskiHarabaszClusterEvaluator in ml package

2020-03-07 Thread Gilles Sadowski

Hello.

2020-03-07 14:50 UTC+01:00, chentao...@qq.com :
> Hi,
>
>>> >  [...]
>>> Solution 3  is "ClusterRanking".
>>> In cases where the reference algorithm would assume the
>>> other convention (i.e. "lower is better"), the implementation
>>> is required to apply a conversion (e.g. return the opposite).
>>
>>s/opposite/inverse/
>>
>>[We should probably enforce that ranking is positive.]
>>
>
> How do we trade this situation with different rank rules(lower is better, or
> higher is better):
> ```java
> if (evaluator.isBetterScore(varianceSum, bestVarianceSum)) {
> // this one is the best we have found so far, remember it
> best= clusters;
> bestVarianceSum = varianceSum;
> }
> ```

I propose that the new API forbids alternative interpretations
of the "score".  Higher will always mean better.
I've just sent another post for discussing enhancements to
the clustering API.  The functional interface presented earlier
is the first topic in that other thread; let's discuss further over
there.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Math] Enhance clustering API

2020-03-07 Thread Gilles Sadowski

Hello.

I've collected[1] a series of old issues reported against code in the
  org.apache.commons.math4.ml.clustering
package.

The first proposed improvement[2] to the API results from a discussion
in another recent thread.
Any objection to that change?

Best,
Gilles

[1] https://issues.apache.org/jira/browse/MATH-1515
[2] https://issues.apache.org/jira/browse/MATH-1516

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math]Discuss: There should be a CalinskiHarabaszClusterEvaluator in ml package

2020-03-07 Thread Gilles Sadowski

> >  [...]
> Solution 3  is "ClusterRanking".
> In cases where the reference algorithm would assume the
> other convention (i.e. "lower is better"), the implementation
> is required to apply a conversion (e.g. return the opposite).

s/opposite/inverse/

[We should probably enforce that ranking is positive.]

Gilles

>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math]Discuss: There should be a CalinskiHarabaszClusterEvaluator in ml package

2020-03-07 Thread Gilles Sadowski

".
In cases where the reference algorithm would assume the
other convention (i.e. "lower is better"), the implementation
is required to apply a conversion (e.g. return the opposite).

>>> >> }
>>> >> ```
>>> >>
>>> >> The code can be implemented by read the algorithm documents,
>>> >> or translate from python sklearn.metrics.calinski_harabasz_score.
>>> >
>>> >What's the license of that code?
>>>
>>> The sklearn is under the BSD license.
>>
>>OK; no problem[1] to have claimed inspiration then. ;-)
>>
>>Please note that, for tracking purpose, your PR should be tied
>>to a JIRA report, and the issue's identifier should prefix the
>>commit message.
>
> This PR is for discussion, I will create a JIRA issue.

Proposals and conceptual discussions are posted here (in this
case, you proposed to contribute a new "ClusterEvaluator".)
If there are no objections, a JIRA report should then be filed, to
which you can attach PRs (this has the advantage of being
automatically tracked).

> But I still do not know how we make a conclusion.

Implementations details will be discussed in JIRA comments
(not GitHub as far as I'm concerned).
[Issues that have more far-reaching consequences can be
posted back here.]

>
>>The PR is also not in sync with current "master" branch.
>
> Which branch should I pull?

"master".
You could probably "rebase" your branch on it; try
  $ git rebase master


Regards,
Gilles

>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: Re: [math]Discuss: There should be a CalinskiHarabaszClusterEvaluator in ml package

2020-03-06 Thread Gilles Sadowski

Le ven. 6 mars 2020 à 14:35, chentao...@qq.com  a écrit :
>
> Hi,
>
> >Hello.
> >
> >2020-03-06 9:48 UTC+01:00, chentao...@qq.com :
> >> Hi,
> >> For machine learning centroid cluster algorithm, we often use is
> >> Calinsk-iHarabasz score to evaluate which algorithm or how many centers is
> >> best for a dataset.
> >> The python lib sklearn implements Calinsk-iHarabasz as
> >> sklearn.metrics.calinski_harabasz_score.
> >
> >Could you post a reference (most of our documentation points
> >to "Wikipedia" or "MathWorld")?
>
> "Calinsk-iHarabasz" is the most popular evaluator for Centriod Clusters as I 
> know.
> I just read the code of sklearn, and think it easy to implement.
> https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html
> https://www.tandfonline.com/doi/abs/10.1080/03610927408827101

Thanks; the original reference is quite fine too.

> >
> >> I think there should be a CalinskiHarabaszClusterEvaluator in commons math:
> >
> >At first sight, the approach would be to define a functional
> >interface (with the "score" method).
> >Then an "enum" that would be a factory of evaluators, along
> >the lines of what has been done in "Commons RNG" (see class
> >"RandomSource"[1]).
>
> I just inherit the design of "ClusterEvaluator",
> and I think change the design of exists API is another question.

Not really: IMHO we should not pile feature on top of an
API that might have shortcomings.  In particular, the fact
that the new calls' constructor calls the parent's constructor
with "null" looks problematic to me.

> >
> >> ```java
> >> package org.apache.commons.math4.ml.clustering.evaluation;
> >>
> >> import org.apache.commons.math4.ml.clustering.Cluster;
> >> import org.apache.commons.math4.ml.clustering.Clusterable;
> >>
> >> import java.util.List;
> >>
> >> public class CalinskiHarabaszClusterEvaluator 
> >> extends
> >> ClusterEvaluator {
> >> @Override
> >> public double score(List> clusters) {
> >> //TODO: Implement the Calinski-Harabasz Score algorithm
> >> return 0;
> >> }
> >>
> >> @Override
> >> public boolean isBetterScore(double score1, double score2) {
> >> return score1 > score2;
> >> }
> >
> >This method does not seem very useful.

I've now seen how this used by "MultiKMeansPlusPlusClusterer".
However, I wonder why the "Multi" feature is only available for that
implementation...

> >> }
> >> ```
> >>
> >> The code can be implemented by read the algorithm documents,
> >> or translate from python sklearn.metrics.calinski_harabasz_score.
> >
> >What's the license of that code?
>
> The sklearn is under the BSD license.

OK; no problem[1] to have claimed inspiration then. ;-)

Please note that, for tracking purpose, your PR should be tied
to a JIRA report, and the issue's identifier should prefix the
commit message.
The PR is also not in sync with current "master" branch.

Regards,
Gilles

[1] http://www.apache.org/legal/resolved.html#category-a

> I think math ml reference the sklearn so much,
> for example: org.apache.commons.math4.userguide.ClusterAlgorithmComparison
>
> >
> >Regards,
> >Gilles
> >
> >[1] 
> >https://commons.apache.org/proper/commons-rng/commons-rng-simple/javadocs/api-1.3/org/apache/commons/rng/simple/RandomSource.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math]Discuss: There should be a CalinskiHarabaszClusterEvaluator in ml package

2020-03-06 Thread Gilles Sadowski

Hello.

2020-03-06 9:48 UTC+01:00, chentao...@qq.com :
> Hi,
> For machine learning centroid cluster algorithm, we often use is
> Calinsk-iHarabasz score to evaluate which algorithm or how many centers is
> best for a dataset.
> The python lib sklearn implements Calinsk-iHarabasz as
> sklearn.metrics.calinski_harabasz_score.

Could you post a reference (most of our documentation points
to "Wikipedia" or "MathWorld")?

> I think there should be a CalinskiHarabaszClusterEvaluator in commons math:

At first sight, the approach would be to define a functional
interface (with the "score" method).
Then an "enum" that would be a factory of evaluators, along
the lines of what has been done in "Commons RNG" (see class
"RandomSource"[1]).

> ```java
> package org.apache.commons.math4.ml.clustering.evaluation;
>
> import org.apache.commons.math4.ml.clustering.Cluster;
> import org.apache.commons.math4.ml.clustering.Clusterable;
>
> import java.util.List;
>
> public class CalinskiHarabaszClusterEvaluator extends
> ClusterEvaluator {
> @Override
> public double score(List> clusters) {
> //TODO: Implement the Calinski-Harabasz Score algorithm
> return 0;
> }
>
> @Override
> public boolean isBetterScore(double score1, double score2) {
> return score1 > score2;
> }

This method does not seem very useful.

> }
> ```
>
> The code can be implemented by read the algorithm documents,
> or translate from python sklearn.metrics.calinski_harabasz_score.

What's the license of that code?

Regards,
Gilles

[1] 
https://commons.apache.org/proper/commons-rng/commons-rng-simple/javadocs/api-1.3/org/apache/commons/rng/simple/RandomSource.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-03-05 Thread Gilles Sadowski

Hello.

Le jeu. 5 mars 2020 à 13:23, chentao...@qq.com  a écrit :
>
> [...]
> It is a API change, how do we start it?

I'd suggest putting the new API in a (temporary) package
  org.apache.math4.ml.clustering2

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [lang] NPE vs IAE

2020-03-05 Thread Gilles Sadowski

Le jeu. 5 mars 2020 à 19:04, Peter Verhas  a écrit :
>
> Just my 2c:
>
> When you have an assertion in the code that throws an exception, be it NPE
> or IAE then your code is less prone to errors during maintenance. In that
> case, the assertion is there is the code, kind of documenting it. If the
> assertion is not there with the reasoning that the subsequent code is
> throwing an NPE anyway then later a code change may alter the behavior,
> which may not be desired, and/or followed by the documentation.
>
> In my opinion, these assertions serve documentation purposes and are an
> exceptional example when the "shorter code is more readable" law does not
> apply. Also, Objects.requireNonNull has a version that can contain a nice
> message that is absolutely valuable for the caller. (I think, this was
> already discussed in detail.)
>
> The only reason I cannot argue with is performance if it was properly
> measured in the relevant desired use cases and the measurement proves that
> the assertions pose significant performance cost.

And it was my sole point.
I obviously agree that more info is nicer, and easier to
maintain.

Gilles

>
> Regards,
> Peter
>
>
> On Thu, Mar 5, 2020 at 4:12 PM sebb  wrote:
>
> > On Wed, 4 Mar 2020 at 14:20, Gilles Sadowski  wrote:
> > >
> > > Le mer. 4 mars 2020 à 15:16, sebb  a écrit :
> > > >
> > > > On Wed, 4 Mar 2020 at 14:09, Gary Gregory 
> > wrote:
> > > > >
> > > > > IMO, until we are all on Java 14 and benefit from its more detailed
> > NPE
> > > > > message, we need to call Validate.notNull _with a message_ that says
> > what
> > > > > variable blew up.
> > > >
> > > > +1
> > > >
> > > > That is another good point.
> > > >
> > > > Unless one has access to the exact same version of the source, it can
> > > > be very tricky to tell which variable has caused the NPE.
> > > > The same applies to letting the JRE throw the NPE.
> > >
> > > Are you assuming that one should be able to fix the bug
> > > without the stack trace?
> >
> > No, I'm saying that the stack trace is not much use without having
> > access to the exact version of the source that was used to create the
> > binary.
> > An approximate line number may not be sufficient to identify the variable.
> >
> > > If so, I fail to see how having the name of a variable will
> > > make it less tricky to locate the cause of the issue...
> >
> > See above.
> >
> > > Gilles
> > >
> > > >> [...]
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> --
> Peter Verhas
> pe...@verhas.com
> t: +41791542095
> skype: verhas

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [lang] NPE vs IAE

2020-03-05 Thread Gilles Sadowski

Le jeu. 5 mars 2020 à 16:12, sebb  a écrit :
>
> On Wed, 4 Mar 2020 at 14:20, Gilles Sadowski  wrote:
> >
> > Le mer. 4 mars 2020 à 15:16, sebb  a écrit :
> > >
> > > On Wed, 4 Mar 2020 at 14:09, Gary Gregory  wrote:
> > > >
> > > > IMO, until we are all on Java 14 and benefit from its more detailed NPE
> > > > message, we need to call Validate.notNull _with a message_ that says 
> > > > what
> > > > variable blew up.
> > >
> > > +1
> > >
> > > That is another good point.
> > >
> > > Unless one has access to the exact same version of the source, it can
> > > be very tricky to tell which variable has caused the NPE.
> > > The same applies to letting the JRE throw the NPE.
> >
> > Are you assuming that one should be able to fix the bug
> > without the stack trace?
>
> No, I'm saying that the stack trace is not much use without having
> access to the exact version of the source that was used to create the
> binary.
> An approximate line number may not be sufficient to identify the variable.
>
> > If so, I fail to see how having the name of a variable will
> > make it less tricky to locate the cause of the issue...
>
> See above.

I'm not trying to prove that less information is better.

However without the exact source, having the exact
name of the variable is similarly of not much use.

IOW, with the exact source and in the case where
the variable is used by the method itself, the stack
shows exactly where the unexpected null occurred.

Moreover, if the null occurred because of a user error
(e.g. missing configuration directive), the link with the
variable name may not be obvious.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-03-05 Thread Gilles Sadowski

>>> >> >the "Cluster"
>>> >> >interface?  What is the difference with method "getCenter" (define by
>>> >> > class
>>> >> >"CentroidCluster")?
>>> >>
>>> >> My understanding is,:
>>> >>  * "Cluster" is a data class that carry the result of a clustering,
>>> >> "getCenter" is just a get method of CentroidCluster for get the value
>>> >> of a center point.
>>> >>  * "Cluster[er]" is a (Interface of )algorithm that classify points to
>>> >> sets of Cluster.
>>> >>  * "CentroidCluster" is the result of a group of special Clusterer
>>> >> algorithm like k-means,
>>> >>  "centroidOf" is a specific logic to calculate the center point for a
>>> >> collection of points.
>>> >> [Instead the DBScan cluster algorithm dose not care about the
>>> >> "Centroid"]
>>> >>
>>> >> So, "centroidOf" may be a method of "CentroidCluster[er]"(not exists
>>> >> yet),
>>> >>  but different with "CentroidCluster.getCenter".
>>> >
>>> >I may be missing something about the existing design,
>>> >but it seems strange that "CentroidCluster" is initialized
>>> >with a given "center", yet it is possible to add points after
>>> >initialization (which IIUC would invalidate the "center").
>>>
>>> The "centroidOf" could be part of "CentroidCluster",
>>> but I think the existsing desgin was focus on decouple of
>>> "DistanceMeasure"("centroidOf" depends on it) and "CentroidCluster".
>>
>>I don't see why we need both "Cluster" and "CentroidCluster".
>>Indeed, as suggested before, the "center" can be computed
>>from a "Cluster", but does not need to be stored in it.
>
> Typical usecase for a List is, when we need classify a new
> point,

Is there a use for "Cluster" instances that do not have a "center"?

> calculate the distance of the new point to each CentroidCluster.center is
> the simplest,
> and the center should be cached.

Yes, but not necessarily within a dedicated class.
I agree that there should be an easy way (API-wise) to classify
a point wrt a list of clusters but we should avoid the potential
inconsistency of a mutable "Cluster" instance.
Efficiency during cluster building (a.o. caching the current center)
is a separate issue from using the resulting list of clusters.

>
>>
>>>
>>> Center recalculate often happens in each iteration of k-means Clustering,
>>> always with points reassign to clusters.
>>> We often use k-means as two pharse:
>>> Pharse 1: Training, classify thousands of points to set of clusters.
>>> Pharse 2: Predict, predict which cluster is best for a new point,
>>> or add a new point to the best cluster in ClusterSet,
>>
>>Method "cluster" returns a "List"; there is no need for a
>>new "ClusterSet" class.
>>Also, IIUC the centers can be collected into a "List",
>>so that the association is through the index into the list(s).
>>
>>> but we never update the cluster center until next retraining.
>>
>>IMO, that's the reason for *not*" storing the center (in such a
>>mutable instance).
>>
>>>
>>> The KMeansPlusPlusClusterer and other Cluster algorithm in "commons-math"
>>> just design for pharse "Training",
>>> it is clearly if we can consider "CentroidCluster" as a pure data class
>>> just for k-means clustering result.
>>
>>See above.
>>
>>Discussing the existing design further, I think that the "cluster" method
>> should
>>rather be:
>>---CUT---
>>public List> cluster(Collections points, DistanceMeasure
>> dist)
>>---CUT---
>>
>>And, similarly,
>>---CUT---
>>@FunctionalInterface
>>public interface ClusterFinder {
>>public int indexOf(T point, List clusters, DistanceMeasure
>> dist);
>>}
>>---CUT---
>
> I think there is a balance between flexibility, efficiency and usability.
> The DistanceMeasure should be assigned once for a kmeans,

Of course, during learning, only one distance measure is to
be used.
But it does not imply that it must be an instance field of the
clustering algorithm.

> and can not change
> in exists clusters.

The "Cluster" class does not define a "DistanceMeasure" field.

> So DistanceMeasure should be a property of Cluster, but redundant for a List
> of Cluster.

Above, you said that it must be a property of "kmeans"...

> Consider the Cluster fewly use separately, the DistanceMeasure should be a
> readonly property of List,

Why?
shouldn't a user be able to query the same list of clusters
using different distance measures?

> If there is a ClusterSet, so we can serialize and unserialize the Clusters
> correctly and with no redundancy.

I don't know about "ClusterSet"; what is its interface?

Anyways, "serialization" is yet another issue, and it should be
dealt with separately from the usage API.
Do you suggest to (de)serialize "Cluster" objects (i.e. instances
of an unknown type)?

> As a comparison, dl4j's design is easy to use for predict, especial the
> ClusterSet
> but leak of  flexibility(stroke coupling to nd4j) and efficiency(complexity
> on nd4j implemention):
> https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nearestneighbors-parent/nearestneighbor-core/src/main/java/org/deeplearning4j/clustering

Please show stripped down versions of the API that illustrates
the points which you want to highlight.  Thanks.

>>
>>> If we want the cluster result useful enough for parse "Predict",
>>>  the result of "KMeansPlusPlusClusterer.cluster" should return a
>>> "ClusterSet":
>>> ```java
>>> public interface ClusterSet extends Collection
>>> {
>>>   // Retrun the cluster which the point should belong to.
>>>   Cluster predict(T point);
>>>   // Add a point to best cluster.
>>>   void addPoint(T point);
>>> }
>>> ```
>>
>>This  "ClusterSet" seems less flexible than a "List".
>
> "ClusterSet" is useful for predict as I mentioned above.

Cf. previous comment.

>>
>>> And "centroidOf"(just used in clustering iteration) can move up into a
>>> abstract class like "CenroidClusterer".
>>
>>It seems that this method could be useful for users too.
>>
>
> May be "centroidOf" is useful, but now it is just used in kmeans.

Is "Kmeans" the only algorithm that computes clusters centers?
Do users never need to compute a center?

Best regards,
Gilles

>>> >It would seem that "center" should be a property computed
>>> >from the contents of "Cluster" e.g.:
>>> >
>>> >@FunctionalInterface
>>> >public interface ClusterCenterComputer {
>>> >T centroidOf(Cluster cluster);
>>> >}
>>> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [lang] NPE vs IAE

2020-03-04 Thread Gilles Sadowski

Le mer. 4 mars 2020 à 15:16, Gilles Sadowski  a écrit :
>
> Le mer. 4 mars 2020 à 15:09, Gary Gregory  a écrit :
> >
> > IMO, until we are all on Java 14 and benefit from its more detailed NPE
> > message, we need to call Validate.notNull _with a message_ that says what
> > variable blew up.
>
> No, we don't *need* to (for the reason stated previously), but you
> may *want* to.

Anyways, see my initial reply; changing from throwing IAE to
throwing NPE is a first step towards ultimately dropping the
double-check while remaining functionally compatible.

Gilles

> > > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [lang] NPE vs IAE

2020-03-04 Thread Gilles Sadowski

Le mer. 4 mars 2020 à 15:16, sebb  a écrit :
>
> On Wed, 4 Mar 2020 at 14:09, Gary Gregory  wrote:
> >
> > IMO, until we are all on Java 14 and benefit from its more detailed NPE
> > message, we need to call Validate.notNull _with a message_ that says what
> > variable blew up.
>
> +1
>
> That is another good point.
>
> Unless one has access to the exact same version of the source, it can
> be very tricky to tell which variable has caused the NPE.
> The same applies to letting the JRE throw the NPE.

Are you assuming that one should be able to fix the bug
without the stack trace?
If so, I fail to see how having the name of a variable will
make it less tricky to locate the cause of the issue...

Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [lang] NPE vs IAE

2020-03-04 Thread Gilles Sadowski

Le mer. 4 mars 2020 à 15:09, Gary Gregory  a écrit :
>
> IMO, until we are all on Java 14 and benefit from its more detailed NPE
> message, we need to call Validate.notNull _with a message_ that says what
> variable blew up.

No, we don't *need* to (for the reason stated previously), but you
may *want* to.

Gilles

>
> Gary
>
> On Wed, Mar 4, 2020 at 9:01 AM Gilles Sadowski  wrote:
>
> > Le mer. 4 mars 2020 à 14:19, Gary Gregory  a
> > écrit :
> > >
> > > On Wed, Mar 4, 2020 at 7:58 AM sebb  wrote:
> > >
> > > > On Sat, 29 Feb 2020 at 18:09, Gilles Sadowski 
> > > > wrote:
> > > > >
> > > > > Le sam. 29 févr. 2020 à 18:39, Gary Gregory 
> > a
> > > > écrit :
> > > > > >
> > > > > > On Sat, Feb 22, 2020 at 5:25 PM Gary Gregory <
> > garydgreg...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I would like to do the same in Lang as with Collections (see
> > below.)\
> > > > > > >
> > > > > > > We currently perform checks like:
> > > > > > >
> > > > > > > Validate.isTrue(foo != null, ...)
> > > > > > >
> > > > > > > Which should be IMO:
> > > > > > >
> > > > > > > Validate.notNull(foo, ...);
> > > > > > >
> > > > > > > The difference being that the former throws IAE and the later
> > NPE.
> > > > > > >
> > > > > > > As with [collections], my argument is the same, the JRE uses
> > > > > > > Objects.requireNonNull() to throw an NPE, so I'd like to keep
> > > > normalizing
> > > > > > > on that.
> > > > > > >
> > > > > >
> > > > > > Any thoughts? Should I proceed?
> > > > >
> > > >
> > > > +1
> > > >
> > > > > +1 for NPE on unexpected null.
> > > > > [But perhaps it is not necessary to double-check, as the JRE
> > > > > will do it anyway (and throw NPE).]
> > > >
> > >
> > > This is not about dropping the check, please re-read the thread,
> > > specifically:
> >
> > Please re-read my reply (3 and 4 lines above).
> > In line with what Sebb mentions; if the code is akin to
> > ---CUT---
> > void doSomethingWith(Object foo) {
> >   Validate.notNull(foo, ...);  // Provably unnecessary because...
> >   System.out.println("foo=" + foo.toString()); // ... this will throw NPE.
> >   // ...
> > }
> > ---CUT---
> > Then the check could be dropped, especially if it would throw
> > NPE instead of IAE.
> >
> > In fact, my impression is that, when we control the chain of calls
> > (e.g. when there is no external dependency), there is always a
> > point where NPE will be thrown (by the JRE) unless the reference
> > is never used; hence the check is not strictly necessary; it will only
> > abort the chain earlier, at the cost of doing many checks that are
> > not needed in the nominal case (where the reference is not null).
> >
> > Gilles
> >
> > >
> > > We currently perform checks like:
> > >
> > > Validate.isTrue(foo != null, ...)
> > >
> > > Which should be IMO:
> > >
> > > Validate.notNull(foo, ...);
> > >
> > > notNull calls the JRE's Objects.requireNonNull()
> > >
> > >
> > > > Depends on how and where the variable is used -- sometimes a null can
> > > > cause hard to track errors, not necessarily NPE, and the exception (or
> > > > possibly other failure) may be a long way from the initial method
> > > > call.
> > > >
> > > > It's almost always better to report the problem as early as possible.
> > > >
> > > > I think the existing checks should be kept.
> > > >
> > >
> > > Agreed, but I am talking but performing a better cleaner check by using
> > > Validate.notNull(foo, ...) instead of  Validate.isTrue(foo != null, ...)
> > >
> > > Gary
> > >
> > >
> > > >
> > > > If it can be proved that the code would generate NPE immediately
> > > > anyway, it might be worth dropping the check.
> > > > However, code changes, so at least a comment should be left in its
> > > > place to note that the code is relying o

Re: [lang] NPE vs IAE

2020-03-04 Thread Gilles Sadowski

Le mer. 4 mars 2020 à 14:19, Gary Gregory  a écrit :
>
> On Wed, Mar 4, 2020 at 7:58 AM sebb  wrote:
>
> > On Sat, 29 Feb 2020 at 18:09, Gilles Sadowski 
> > wrote:
> > >
> > > Le sam. 29 févr. 2020 à 18:39, Gary Gregory  a
> > écrit :
> > > >
> > > > On Sat, Feb 22, 2020 at 5:25 PM Gary Gregory 
> > wrote:
> > > >
> > > > > I would like to do the same in Lang as with Collections (see below.)\
> > > > >
> > > > > We currently perform checks like:
> > > > >
> > > > > Validate.isTrue(foo != null, ...)
> > > > >
> > > > > Which should be IMO:
> > > > >
> > > > > Validate.notNull(foo, ...);
> > > > >
> > > > > The difference being that the former throws IAE and the later NPE.
> > > > >
> > > > > As with [collections], my argument is the same, the JRE uses
> > > > > Objects.requireNonNull() to throw an NPE, so I'd like to keep
> > normalizing
> > > > > on that.
> > > > >
> > > >
> > > > Any thoughts? Should I proceed?
> > >
> >
> > +1
> >
> > > +1 for NPE on unexpected null.
> > > [But perhaps it is not necessary to double-check, as the JRE
> > > will do it anyway (and throw NPE).]
> >
>
> This is not about dropping the check, please re-read the thread,
> specifically:

Please re-read my reply (3 and 4 lines above).
In line with what Sebb mentions; if the code is akin to
---CUT---
void doSomethingWith(Object foo) {
  Validate.notNull(foo, ...);  // Provably unnecessary because...
  System.out.println("foo=" + foo.toString()); // ... this will throw NPE.
  // ...
}
---CUT---
Then the check could be dropped, especially if it would throw
NPE instead of IAE.

In fact, my impression is that, when we control the chain of calls
(e.g. when there is no external dependency), there is always a
point where NPE will be thrown (by the JRE) unless the reference
is never used; hence the check is not strictly necessary; it will only
abort the chain earlier, at the cost of doing many checks that are
not needed in the nominal case (where the reference is not null).

Gilles

>
> We currently perform checks like:
>
> Validate.isTrue(foo != null, ...)
>
> Which should be IMO:
>
> Validate.notNull(foo, ...);
>
> notNull calls the JRE's Objects.requireNonNull()
>
>
> > Depends on how and where the variable is used -- sometimes a null can
> > cause hard to track errors, not necessarily NPE, and the exception (or
> > possibly other failure) may be a long way from the initial method
> > call.
> >
> > It's almost always better to report the problem as early as possible.
> >
> > I think the existing checks should be kept.
> >
>
> Agreed, but I am talking but performing a better cleaner check by using
> Validate.notNull(foo, ...) instead of  Validate.isTrue(foo != null, ...)
>
> Gary
>
>
> >
> > If it can be proved that the code would generate NPE immediately
> > anyway, it might be worth dropping the check.
> > However, code changes, so at least a comment should be left in its
> > place to note that the code is relying on Java to generate the NPE at
> > this point.
> >
>
> > > Gilles
> > >
> > > >
> > > > Gary
> > > >
> > > >
> > > > >
> > > > > Gary
> > > > >
> > > > > -- Forwarded message -
> > > > > From: Gary Gregory 
> > > > > Date: Tue, Dec 10, 2019 at 9:59 AM
> > > > > Subject: Re: [collection] NPE vs IAE in
> > > > > org.apache.commons.collections4.CollectionUtils
> > > > > To: Commons Developers List , Bruno P.
> > Kinoshita <
> > > > > brunodepau...@yahoo.com.br>
> > > > >
> > > > >
> > > > > FTR, using requireNonNull is also an 'Effective Java' recommendation.
> > > > >
> > > > > Gary
> > > > >
> > > > > On Thu, Dec 5, 2019 at 4:54 PM Bruno P. Kinoshita
> > > > >  wrote:
> > > > >
> > > > >>  +1 for NPE
> > > > >>
> > > > >> On Friday, 6 December 2019, 5:22:34 am NZDT, Gary Gregory <
> > > > >> garydgreg...@gmail.com> wrote:
> > > > >>
> > > > >>  Hi All:
> > > > >>
> > > > >> org.apache.commons.collections4.CollectionUtils contains a mix of
> > checking
> > > > >> for null inputs by throwing NullPointerExceptions in some methods
> > and
> > > > >> IllegalArgumentExceptions in others.
> > > > >>
> > > > >> I propose we standardized to NPE simply because the JRE provides
> > > > >> Objects.requireNonNull() just for this purpose.
> > > > >>
> > > > >> Gary
> > > > >>
> > > > >
> > > > >
> > >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [collections] Bloom filters

2020-03-02 Thread Gilles Sadowski

Hello.

Le lun. 2 mars 2020 à 14:19, Manas Kalangan  a écrit :
>
> hey guys , i am manas , 2nd year computer engineering student, this is my
> first time in GSoC, could someone help me with project idea?

Welcome at the Commons project's discussion forum, but please
do not hijack threads.[1]

Best regards,
Gilles

[1] https://www.urbandictionary.com/define.php?term=Thread%20Hijacking

> On Mon, Mar 2, 2020 at 6:42 PM Alex Herbert 
> wrote:
>
> >
> > [Entirely unrelated conversation skipped.]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [lang] NPE vs IAE

2020-02-29 Thread Gilles Sadowski

Le sam. 29 févr. 2020 à 18:39, Gary Gregory  a écrit :
>
> On Sat, Feb 22, 2020 at 5:25 PM Gary Gregory  wrote:
>
> > I would like to do the same in Lang as with Collections (see below.)\
> >
> > We currently perform checks like:
> >
> > Validate.isTrue(foo != null, ...)
> >
> > Which should be IMO:
> >
> > Validate.notNull(foo, ...);
> >
> > The difference being that the former throws IAE and the later NPE.
> >
> > As with [collections], my argument is the same, the JRE uses
> > Objects.requireNonNull() to throw an NPE, so I'd like to keep normalizing
> > on that.
> >
>
> Any thoughts? Should I proceed?

+1 for NPE on unexpected null.
[But perhaps it is not necessary to double-check, as the JRE
will do it anyway (and throw NPE).]

Gilles

>
> Gary
>
>
> >
> > Gary
> >
> > -- Forwarded message -
> > From: Gary Gregory 
> > Date: Tue, Dec 10, 2019 at 9:59 AM
> > Subject: Re: [collection] NPE vs IAE in
> > org.apache.commons.collections4.CollectionUtils
> > To: Commons Developers List , Bruno P. Kinoshita <
> > brunodepau...@yahoo.com.br>
> >
> >
> > FTR, using requireNonNull is also an 'Effective Java' recommendation.
> >
> > Gary
> >
> > On Thu, Dec 5, 2019 at 4:54 PM Bruno P. Kinoshita
> >  wrote:
> >
> >>  +1 for NPE
> >>
> >> On Friday, 6 December 2019, 5:22:34 am NZDT, Gary Gregory <
> >> garydgreg...@gmail.com> wrote:
> >>
> >>  Hi All:
> >>
> >> org.apache.commons.collections4.CollectionUtils contains a mix of checking
> >> for null inputs by throwing NullPointerExceptions in some methods and
> >> IllegalArgumentExceptions in others.
> >>
> >> I propose we standardized to NPE simply because the JRE provides
> >> Objects.requireNonNull() just for this purpose.
> >>
> >> Gary
> >>
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-28 Thread Gilles Sadowski

erer 
> >> algorithm like k-means,
> >>  "centroidOf" is a specific logic to calculate the center point for a 
> >> collection of points.
> >> [Instead the DBScan cluster algorithm dose not care about the "Centroid"]
> >>
> >> So, "centroidOf" may be a method of "CentroidCluster[er]"(not exists yet),
> >>  but different with "CentroidCluster.getCenter".
> >
> >I may be missing something about the existing design,
> >but it seems strange that "CentroidCluster" is initialized
> >with a given "center", yet it is possible to add points after
> >initialization (which IIUC would invalidate the "center").
>
> The "centroidOf" could be part of "CentroidCluster",
> but I think the existsing desgin was focus on decouple of 
> "DistanceMeasure"("centroidOf" depends on it) and "CentroidCluster".

I don't see why we need both "Cluster" and "CentroidCluster".
Indeed, as suggested before, the "center" can be computed
from a "Cluster", but does not need to be stored in it.

>
> Center recalculate often happens in each iteration of k-means Clustering,
> always with points reassign to clusters.
> We often use k-means as two pharse:
> Pharse 1: Training, classify thousands of points to set of clusters.
> Pharse 2: Predict, predict which cluster is best for a new point,
> or add a new point to the best cluster in ClusterSet,

Method "cluster" returns a "List"; there is no need for a
new "ClusterSet" class.
Also, IIUC the centers can be collected into a "List",
so that the association is through the index into the list(s).

> but we never update the cluster center until next retraining.

IMO, that's the reason for *not*" storing the center (in such a
mutable instance).

>
> The KMeansPlusPlusClusterer and other Cluster algorithm in "commons-math" 
> just design for pharse "Training",
> it is clearly if we can consider "CentroidCluster" as a pure data class just 
> for k-means clustering result.

See above.

Discussing the existing design further, I think that the "cluster" method should
rather be:
---CUT---
public List> cluster(Collections points, DistanceMeasure dist)
---CUT---

And, similarly,
---CUT---
@FunctionalInterface
public interface ClusterFinder {
public int indexOf(T point, List clusters, DistanceMeasure dist);
}
---CUT---

> If we want the cluster result useful enough for parse "Predict",
>  the result of "KMeansPlusPlusClusterer.cluster" should return a  
> "ClusterSet":
> ```java
> public interface ClusterSet extends Collection {
>   // Retrun the cluster which the point should belong to.
>   Cluster predict(T point);
>   // Add a point to best cluster.
>   void addPoint(T point);
> }
> ```

This  "ClusterSet" seems less flexible than a "List".

> And "centroidOf"(just used in clustering iteration) can move up into a 
> abstract class like "CenroidClusterer".

It seems that this method could be useful for users too.

Best,
Gilles

> >It would seem that "center" should be a property computed
> >from the contents of "Cluster" e.g.:
> >
> >@FunctionalInterface
> >public interface ClusterCenterComputer {
> >T centroidOf(Cluster cluster);
> >}
> >
> >Regards,
> >Gilles
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-27 Thread Gilles Sadowski

Hi.

Le jeu. 27 févr. 2020 à 06:17, chentao...@qq.com  a écrit :
>
> Hi,
>
> > [...]
> >> >>
> >> >> Do you mean I should fire a JIRA issue about reuse"centroidOf" 
> >> >> and "chooseInitialCenters",
> >> >> then start a PR and a disscuss about "ClusterUtils"?
> >> >> And thenstart the PR of "MiniBatchKMeansClusterer" after all done?
> >> >
> >> >I cannot guarantee that the whole process will be streamlined.
> >> >In effect, you can work on multiple branches (one for each
> >> >prospective PR).
> >> >I'd say that you should start by describing (here on the ML) the
> >> >rationale for "ClusterUtils" (and contrast it with say, a common
> >> >base class).
> >> >[Only when the design has been agreed on,  a JIRA issue to
> >> >implement it should be created in order to track the actual
> >> >coding work).]
> >>
> >> OK, I think we should start from here:
> >>
> >> The method "centroidOf"  and "chooseInitialCenters" in 
> >> KMeansPlusPlusClusterer
> >>  could be reused by other KMeans Clusterer like MiniBatchKMeansClusterer 
> >> which I want to implement.
> >>
> >> There are two solution for reuse "centroidOf"  and "chooseInitialCenters":
> >> 1. Extract a abstract class for KMeans Clusterer named 
> >> "AbstractKMeansClusterer",
> >>  and move "centroidOf"  and "chooseInitialCenters" as protected methods in 
> >> it;
> >>  the EmptyClusterStrategy and related logic can also move to the 
> >> "AbstractKMeansClusterer".
> >> 2. Create a static utility class, and move "centroidOf"  and 
> >> "chooseInitialCenters" in it,
> >>  and some useful clustering method like predict(Predict which cluster is 
> >> best for a specified point) can put in it.
> >>
> >
> >At first sight, I prefer option 1.
> >Indeed, o.a things "chooseInitialCenters" is a method that is of no interest 
> >to
> >users of the functionality (and so should not be part of the "public" API).
>
> Persuasive explain, and I agree with you, that extract a abstract class for 
> KMeans is better.
> And how can we make a conclusion?
> -
>
> Mention the "public API", I suppose there should be a series of 
> "CentroidInitializer",
>  that "chooseInitialCenters" with various of algorithms.
> The k-means++ cluster algorithm is a special implementation of k-means
>  which initialize cluster centers with k-means++ algorithm.
> So if there is a "CentroidInitializer", "KMeansPlusPlusClusterer" can be 
> "KMeansClusterer"
>  with a "KMeansPlusPlusCentroidInitializer" strategy.
> When "KMeansClusterer" initialize with a "RandomCentroidInitializer", it is a 
> common k-means.
>
> --
> >Method "centroidOf" looks generally useful.  Shouldn't it be part of
> >the "Cluster"
> >interface?  What is the difference with method "getCenter" (define by class
> >"CentroidCluster")?
>
> My understanding is,:
>  * "Cluster" is a data class that carry the result of a clustering,
> "getCenter" is just a get method of CentroidCluster for get the value of a 
> center point.
>  * "Cluster[er]" is a (Interface of )algorithm that classify points to sets 
> of Cluster.
>  * "CentroidCluster" is the result of a group of special Clusterer algorithm 
> like k-means,
>  "centroidOf" is a specific logic to calculate the center point for a 
> collection of points.
> [Instead the DBScan cluster algorithm dose not care about the "Centroid"]
>
> So, "centroidOf" may be a method of "CentroidCluster[er]"(not exists yet),
>  but different with "CentroidCluster.getCenter".

I may be missing something about the existing design,
but it seems strange that "CentroidCluster" is initialized
with a given "center", yet it is possible to add points after
initialization (which IIUC would invalidate the "center").
It would seem that "center" should be a property computed
from the contents of "Cluster" e.g.:

@FunctionalInterface
public interface ClusterCenterComputer {
T centroidOf(Cluster cluster);
}

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-26 Thread Gilles Sadowski

Hello.

[Message formatting is fine now.  Thanks!]

Le mer. 26 févr. 2020 à 15:20, chentao...@qq.com  a écrit :
>
> Hi,
>
> >Hello.
> >
> >[Please try and set your mail client to send plain text messages.]
> >
> >Le mer. 26 févr. 2020 à 14:05, CT  a écrit :
> >>
> >> Hi Gilles,
> >> --Original--
> >> From:"GillesSadowski" >> Date:Wed, Feb 26, 2020 05:41 PM
> >> To:"Commons Developers List" >>
> >> Subject:Re: [math] Discuss: New feature MiniBatchKMeansClusterer
> >>
> >>
> >>
> > [...]
> >>
> >> Do you mean I should fire a JIRA issue about reuse"centroidOf" and 
> >> "chooseInitialCenters",
> >> then start a PR and a disscuss about "ClusterUtils"?
> >> And thenstart the PR of "MiniBatchKMeansClusterer" after all done?
> >
> >I cannot guarantee that the whole process will be streamlined.
> >In effect, you can work on multiple branches (one for each
> >prospective PR).
> >I'd say that you should start by describing (here on the ML) the
> >rationale for "ClusterUtils" (and contrast it with say, a common
> >base class).
> >[Only when the design has been agreed on,  a JIRA issue to
> >implement it should be created in order to track the actual
> >coding work).]
>
> OK, I think we should start from here:
>
> The method "centroidOf"  and "chooseInitialCenters" in KMeansPlusPlusClusterer
>  could be reused by other KMeans Clusterer like MiniBatchKMeansClusterer 
> which I want to implement.
>
> There are two solution for reuse "centroidOf"  and "chooseInitialCenters":
> 1. Extract a abstract class for KMeans Clusterer named 
> "AbstractKMeansClusterer",
>  and move "centroidOf"  and "chooseInitialCenters" as protected methods in it;
>  the EmptyClusterStrategy and related logic can also move to the 
> "AbstractKMeansClusterer".
> 2. Create a static utility class, and move "centroidOf"  and 
> "chooseInitialCenters" in it,
>  and some useful clustering method like predict(Predict which cluster is best 
> for a specified point) can put in it.
>

At first sight, I prefer option 1.
Indeed, o.a things "chooseInitialCenters" is a method that is of no interest to
users of the functionality (and so should not be part of the "public" API).
Method "centroidOf" looks generally useful.  Shouldn't it be part of
the "Cluster"
interface?  What is the difference with method "getCenter" (define by class
"CentroidCluster")?

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-26 Thread Gilles Sadowski

Hello.

[Please try and set your mail client to send plain text messages.]

Le mer. 26 févr. 2020 à 14:05, CT  a écrit :
>
> Hi Gilles,
> --Original--
> From:"GillesSadowski" Date:Wed, Feb 26, 2020 05:41 PM
> To:"Commons Developers List"
> Subject:Re: [math] Discuss: New feature MiniBatchKMeansClusterer
>
>
>
> [...]
>
>  Do you mean this:
>  nbsp;* For JIRA issue #MATH-1509 Start a PR with 
> "MiniBatchKMeansClusterer", but without the "ClusterUtils",
>  despite the duplicate code between "MiniBatchKMeansClusterer" and 
> "KMeansPlusPlusClusterer",
>  also with "CentroidInitializer" and test code with in a single 
> commit.
>  nbsp;* Suggestions like "remove the constructors with default 
> parameters" should apply as a new commit of the PR above,
>  and tracking by a subtask of JIRA issue #MATH-1509.
>  nbsp;* Fire a new JIRA issue for the duplicate code, and start 
> another PR with "ClusterUtils" in,
>  and extract duplicate code into "ClusterUtils".
> 
> No, you should start with the smallest possible self-contained PR.
> For example, why should we commit a code that defines several
> constructors, while we already know that a second commit should
> remove them?
> 
> As you've noticed that some functionality must be factored out of
> "KMeansPlusPlusClusterer", this should be done first as a separate
> JIRA issue. IIUC, you propose "ClusterUtils". By reviewing a
> minimal PR, we should be able to examine whether another
> approach might be better (than a "utility" class) in order to expose
> functionality common to all clusterer algorithms.
> For example, could all "Kmeans" implementations inherit from
> a common base class?
>
> Do you mean I should fire a JIRA issue about reuse"centroidOf" and 
> "chooseInitialCenters",
> then start a PR and a disscuss about "ClusterUtils"?
> And thenstart the PR of "MiniBatchKMeansClusterer" after all done?

I cannot guarantee that the whole process will be streamlined.
In effect, you can work on multiple branches (one for each
prospective PR).
I'd say that you should start by describing (here on the ML) the
rationale for "ClusterUtils" (and contrast it with say, a common
base class).
[Only when the design has been agreed on,  a JIRA issue to
implement it should be created in order to track the actual
coding work).]

Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-26 Thread Gilles Sadowski

Hello.

Le mer. 26 févr. 2020 à 03:49, CT  a écrit :
>
> Hi Gilles,
>
>
> -- Original --
> From:"GillesSadowski" Date:Wed, Feb 26, 2020 00:32 AM
> To:"Commons Developers List"
> Subject:Re: [math] Discuss: New feature MiniBatchKMeansClusterer
>
>
>
> Hello.
> 
> [Side question: Do you send a copy of your posts directly to me?
> If so, that is not necessary and is even annoying because when I
> hit "reply" in my mail client, the conversation continues off-list...]
>
> Yes, I always copy to you, I found the ML never forward my email up to now
> (that I subscribe the ML with my other email and never received my own email).
> Sorry for annoying you, but you are the only person received my email because 
> I CC to you.

This last mail of yours was received by the mailing list, as witnessed
by the archive copy:
https://markmail.org/message/qjk6zmdypqigxl2k

Hitting "reply" in my mail client now works as expected (setting the
destination to the ML).

However, you sent the same post twice.  And the quoted text is
somewhat mangled again by the quotation mark being replaced
by the corresponding HTML entity ("").

> Le mar. 25 févr. 2020 à 14:53, CT  
>  Hi Gilles:
>   Sorry for my unfamiliar in contribution. I started a new PR 
> for most of your suggestion:
>  https://github.com/apache/commons-math/pull/120
> 
> Sorry for seemingly nit-picking but the global issue is the same
> as with PR #118: It contains too many unrelated changes.
> There should be *one* PR for each batch of significant changes.
> More precisely, for this work, that would entail (a.o. things), one
> PR for each of the following:
>  * Class "ClusterUtils" (design yet to be discussed)
>  * Factoring out whatever code is necessary for your proposed
> feature (that would otherwise be duplicated)
> 
> We try and avoid bloating the API, hence the changes which
> I've suggested:
>  * remove the constructors with default parameters
> 
> Overall, each PR should probably contain a single commit, in
> order to ease review.
> Do you mean this:
> * For JIRA issue #MATH-1509 Start a PR with "MiniBatchKMeansClusterer", 
> but without the "ClusterUtils",
> despite the duplicate code between "MiniBatchKMeansClusterer" and 
> "KMeansPlusPlusClusterer",
> also with "CentroidInitializer" and test code with in a single commit.
> * Suggestions like "remove the constructors with default parameters" 
> should apply as a new commit of the PR above,
> and tracking by a subtask of JIRA issue #MATH-1509.
> * Fire a new JIRA issue for the duplicate code, and start another PR 
> with "ClusterUtils" in,
> and extract duplicate code into "ClusterUtils".

No, you should start with the smallest possible self-contained PR.
For example, why should we commit a code that defines several
constructors, while we already know that a second commit should
remove them?

As you've noticed that some functionality must be factored out of
"KMeansPlusPlusClusterer", this should be done first as a separate
JIRA issue. IIUC, you propose "ClusterUtils".  By reviewing a
minimal PR, we should be able to examine whether another
approach might be better (than a "utility" class) in order to expose
functionality common to all clusterer algorithms.
For example, could all "Kmeans" implementations inherit from
a common base class?

Best regards,
Gilles

> "org.apache.commons.math4.userguide.ClusterAlgorithmComparison":
>  I remain have one question below:
> 
>  -- Original --
>  From: "Gilles Sadowski"  Date: Mon, Feb 24, 2020 09:52 PM
>  To: "dev"  Subject: Re: [math] Discuss: New feature MiniBatchKMeansClusterer
> 
>  Hi.
>  
>  Le sam. 22 févr. 2020 à 14:37, CT   
>   Hi Gilles:
>I really appricate for your patient to help me to solve 
> the mail sending problem, I try to set the only setting about charset "Use 
> Unicode" for this mail.
>  
>  The contents seems fine. However, there is something wrong: 
> replying to your
>  message sends to you instead of the mailing list...
>  
>I have created a pull request: 
> https://github.com/apache/commons-math/pull/118
>  
>  That PR fails the build:
>
> https://travis-ci.org/apache/commons-math/builds/653293451?utm_source=github_statusutm_medium=notification
>  
>  Also, the code doesn't take into account all the remarks which I
>  made in previous comments (e.g. file JIRA reports to allow tracking
>  of changes to existing code and not mix those with new code).
>  
> 
>  I did not get what kinds of p

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-25 Thread Gilles Sadowski

Hello.

[Side question: Do you send a copy of your posts directly to me?
If so, that is not necessary and is even annoying because when I
hit "reply" in my mail client, the conversation continues off-list...]

Le mar. 25 févr. 2020 à 14:53, CT  a écrit :
>
> Hi Gilles:
>   Sorry for my unfamiliar in contribution. I started a new PR for most of 
> your suggestion:
> https://github.com/apache/commons-math/pull/120

Sorry for seemingly nit-picking but the global issue is the same
as with PR #118: It contains too many unrelated changes.
There should be *one* PR for each batch of significant changes.
More precisely, for this work, that would entail (a.o. things), one
PR for each of the following:
 * Class "ClusterUtils" (design yet to be discussed)
 * Factoring out whatever code is necessary for your proposed
feature (that would otherwise be duplicated)

We try and avoid bloating the API, hence the changes which
I've suggested:
 * remove the constructors with default parameters

Overall, each PR should probably contain a single commit, in
order to ease review.

> I remain have one question below:
>
> -- Original --
> From: "Gilles Sadowski";
> Date: Mon, Feb 24, 2020 09:52 PM
> To: "dev";
> Subject: Re: [math] Discuss: New feature MiniBatchKMeansClusterer
>
> >Hi.
> >
> >Le sam. 22 févr. 2020 à 14:37, CT  a écrit :
> >>
> >> Hi Gilles:
> >>   I really appricate for your patient to help me to solve the mail sending 
> >> problem, I try to set the only setting about charset "Use >Unicode" for 
> >> this mail.
> >
> >The contents seems fine.  However, there is something wrong: replying to your
> >message sends to you instead of the mailing list...
> >
> >>   I have created a pull request: 
> >> https://github.com/apache/commons-math/pull/118
> >
> >That PR fails the build:
> >   
> > https://travis-ci.org/apache/commons-math/builds/653293451?utm_source=github_status_medium=notification
> >
> >Also, the code doesn't take into account all the remarks which I
> >made in previous comments (e.g. file JIRA reports to allow tracking
> >of changes to existing code and not mix those with new code).
> >
>
> I did not get what kinds of problem should tracking by JIRA.
> In my opinion all the change is related to my PR.

Yes, they are related but they could be provided in more focused
PRs as explained above.

> The JIRA issue about Feature MiniBatch has been closed.

It is not closed:
   https://issues.apache.org/jira/projects/MATH/issues/MATH-1509

> Should I fire a new issue?

You should create one JIRA issue per self-contained change.
You could also create "sub-tasks" of MATH-1509 (for changes that
are strongly related to your contribution).

>
> >In addition, you should avoid mixing massive cosmetic changes (e.g.
> >Javadoc reformatting) within a commit than contains other types of
> >change:
> >
> > https://github.com/apache/commons-math/pull/118/commits/47a055e6264d084547854f9290461f020f2131cf
> >
> >>   And a comparsion between KMeans and MiniBatchKMeans by using the 
> >> >"org.apache.commons.math4.userguide.ClusterAlgorithmComparison":
> >
> >Functionality seems to work fine, but ability to review incremental
> >changes is extremely important in a collaborative project.
> >
>
> Thanks for your remind, and I created a new PR that improve the problems one 
> by one.

I hope that I've now fixed the misunderstanding,
Gilles

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-24 Thread Gilles Sadowski

Hi.

Le sam. 22 févr. 2020 à 14:37, CT  a écrit :
>
> Hi Gilles:
>   I really appricate for your patient to help me to solve the mail sending 
> problem, I try to set the only setting about charset "Use Unicode" for this 
> mail.

The contents seems fine.  However, there is something wrong: replying to your
message sends to you instead of the mailing list...

>   I have created a pull request: 
> https://github.com/apache/commons-math/pull/118

That PR fails the build:
   
https://travis-ci.org/apache/commons-math/builds/653293451?utm_source=github_status_medium=notification

Also, the code doesn't take into account all the remarks which I
made in previous comments (e.g. file JIRA reports to allow tracking
of changes to existing code and not mix those with new code).

In addition, you should avoid mixing massive cosmetic changes (e.g.
Javadoc reformatting) within a commit than contains other types of
change:

https://github.com/apache/commons-math/pull/118/commits/47a055e6264d084547854f9290461f020f2131cf

>   And a comparsion between KMeans and MiniBatchKMeans by using the 
> "org.apache.commons.math4.userguide.ClusterAlgorithmComparison":

Functionality seems to work fine, but ability to review incremental
changes is extremely important in a collaborative project.

Thanks,
Gilles


>
> Hello.
>
> Le mar. 18 févr. 2020 à 04:49, 陈 涛  a écrit :
> >
> > Hi Gilles:
> >
> >I really do not know if anyone received my last mail, no one replay me 
> > for a long time so I send it again and copy to you with another email 
> > system.
>
> Sorry for the delay. :-}
>
> >
> > > Some remarks:
> >
> > > * I didn't get why the "KMeansPlusPlusCentroidInitializer" class
> > > does not call the existing "KMeansPlusPlusClusterer".
> > > Code seems duplicated: As there is a case for reuse, the currently
> > > "private" centroid initialization code should be factored out.
> >
> > This is alpha version for discuss the "MiniBatchKMeansClusterer" algorithm,
>
> I guess that you mean that we discuss your implementation of the
> algorithm referenced in the Javadoc.
>
> > and when "centroidOf" is extracted from "KMeansPlusPlusClusterer",
> >
> > the "KMeansPlusPlusClusterer" is not "KMeansPlusPlusClusterer" anymore but 
> > "KMeansClusterer",
>
> I don't follow.
>
> >
> > this is a significant change, so I did not reactor it.
>
> Significant changes are welcome (since the next release will contain
> other major changes anyways) if they improve the code base (like e.g.
> reducing code duplication).
>
> >
> >
> >
> > > * In "CentroidInitializer", I'd rename "chooseCentroids" to emphasize
> > > that a computation is performed (as opposed to selecting from an
> > > existing data structure).
>
> I think I'd prefer "selectCentroids".
>
> >
> > It is extract from "KMeansPlusPlusClusterer.centroidOf", should remain be 
> > "centroidOf"?
>
> I don't understand.
>
> It would be easier if you create a pull request, so that we can clearly see
> what codes are added/removed/changed.
>
> >
> > The subclass "RandomCentroidInitializer" and 
> > "KMeansPlusPlusCentroidInitializer" indicate the algorithm used.
> >
> >
> > > * Not convinced that there should be so many constructors (in most
> > > cases, it makes no sense to pick default values that are likely to
> > > be heavily application-dependent.
> >
> > I can add more constructors.
>
> No, the constructors with default values clutter the API, for
> no obvious gain IMHO.
> [If the default values make sense, they must be documented.]
>
> >
> > I'd like a builder class more than constructors, but does not meet the 
> > historical code style.
>
> Now is the time for improving the API.
> It would be quite helpful to create a report on JIRA with "sub-tasks"
> for all such API proposed changes.
>
> > > * Input should generally be validated: e.g. the maximum number of
> > > iterations should not be changed unwittingly; rather, an exception
> > > should be raised if the user passed a negative value.
> >
> > Thanks for your advices, I will improve these.
> >
> > > Could be nice to illustrate (not just with a picture, but in a table
> > > with entries average over several runs) the differences in result
> > > between the implementations, using various setups (number of
> > > clusters, stopping criterion, etc

Re: [numbers] Release?

2020-02-21 Thread Gilles Sadowski

Le sam. 22 févr. 2020 à 01:30, Alex Herbert  a écrit :
>
>
>
> > On 22 Feb 2020, at 00:29, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le ven. 21 févr. 2020 à 23:15, Matt Juntunen
> >  a écrit :
> >>
> >> Are we waiting on anything for a numbers release?
> >
> > I don't think so.
>
> Are you talking about a beta release where the API is not yet frozen?

Yes.

> I’m still testing versions of LinearCombination. But from the discussion on 
> NUMBERS-142 [1] it seems the choice may be to just change the current class 
> to use a more precise method. It will be slower than the current method but 
> will have an ensured accuracy of 1 ULP. It will be much faster than 
> BigDecimal. All the testing implementations can go into the examples module 
> for reference.
>
> I have 1 PR for Complex to add an internal version of Math.hypot 
> (NUMBERS-143). I’ll go over this soon and bring it in. The method is faster 
> and more accurate than Math.hypot.
>
> I think Complex is ISO C99 compliant and quite robust to edge cases. The 
> javadoc needs a second pass and then an internal rearrangement of the code 
> layout. I’ve left this until last so that the git change history is clear. 
> But the methods and API are done.
>
> Then there is the implementation of ComplexList for storing and working with 
> many complex numbers. This would be a replacement for part of 
> numbers.complex.stream.ComplexUtils. The question is should this part of the 
> API be established before any release? If a beta then we can remove redundant 
> methods from ComplexUtils later.

I would not include the "commons-numbers-complex-streams" module
(IIRC you mentioned that for performance, "ComplexList" should be in
the same module as "Complex").

Regards,
Gilles

>
> Alex
>
> [1] https://issues.apache.org/jira/browse/NUMBERS-142# 
> <https://issues.apache.org/jira/browse/NUMBERS-142#>
>
>
> >
> > Best,
> > Gilles
> >
> >>
> >>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Release?

2020-02-21 Thread Gilles Sadowski

Hi.

Le ven. 21 févr. 2020 à 23:15, Matt Juntunen
 a écrit :
>
> Are we waiting on anything for a numbers release?

I don't think so.

Best,
Gilles

>
> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-19 Thread Gilles Sadowski

[Re-sending post so that it is attached to the original thread.]

Hello.

Le mar. 18 févr. 2020 à 04:49, 陈 涛  a écrit :
>
> Hi Gilles:
>
>I really do not know if anyone received my last mail, no one replay me for 
> a long time so I send it again and copy to you with another email system.

Sorry for the delay. :-}

>
> > Some remarks:
>
> > * I didn't get why the "KMeansPlusPlusCentroidInitializer" class
> > does not call the existing "KMeansPlusPlusClusterer".
> > Code seems duplicated: As there is a case for reuse, the currently
> > "private" centroid initialization code should be factored out.
>
> This is alpha version for discuss the "MiniBatchKMeansClusterer" algorithm,

I guess that you mean that we discuss your implementation of the
algorithm referenced in the Javadoc.

> and when "centroidOf" is extracted from "KMeansPlusPlusClusterer",
>
> the "KMeansPlusPlusClusterer" is not "KMeansPlusPlusClusterer" anymore but 
> "KMeansClusterer",

I don't follow.

>
> this is a significant change, so I did not reactor it.

Significant changes are welcome (since the next release will contain
other major changes anyways) if they improve the code base (like e.g.
reducing code duplication).

>
>
>
> > * In "CentroidInitializer", I'd rename "chooseCentroids" to emphasize
> > that a computation is performed (as opposed to selecting from an
> > existing data structure).

I think I'd prefer "selectCentroids".

>
> It is extract from "KMeansPlusPlusClusterer.centroidOf", should remain be 
> "centroidOf"?

I don't understand.

It would be easier if you create a pull request, so that we can clearly see
what codes are added/removed/changed.

>
> The subclass "RandomCentroidInitializer" and 
> "KMeansPlusPlusCentroidInitializer" indicate the algorithm used.
>
>
> > * Not convinced that there should be so many constructors (in most
> > cases, it makes no sense to pick default values that are likely to
> > be heavily application-dependent.
>
> I can add more constructors.

No, the constructors with default values clutter the API, for
no obvious gain IMHO.
[If the default values make sense, they must be documented.]

>
> I'd like a builder class more than constructors, but does not meet the 
> historical code style.

Now is the time for improving the API.
It would be quite helpful to create a report on JIRA with "sub-tasks"
for all such API proposed changes.

> > * Input should generally be validated: e.g. the maximum number of
> > iterations should not be changed unwittingly; rather, an exception
> > should be raised if the user passed a negative value.
>
> Thanks for your advices, I will improve these.
>
> > Could be nice to illustrate (not just with a picture, but in a table
> > with entries average over several runs) the differences in result
> > between the implementations, using various setups (number of
> > clusters, stopping criterion, etc.).
>
> I will make more tests, include benchmarks.
>
> It is a challenge for me to generate the various kinds of test data,
>
> could anybody supply me the test data of this comparsion: 
> http://commons.apache.org/proper/commons-math/userguide/ml.html

They are generated programmatically; code is here:

https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/userguide/java/org/apache/commons/math4/userguide

[I've just updated the codes so that they compiles and run using
the new dependencies (see the "README" file).]

> > "MT_64" is probably not the best default.  And this is one of the
> > parameters from which there should not be a default IMO.
>
> I will do more tests

You don't need to test the generators; users should choose
by themselves (from those provided in "Commons RNG").

>
> > [Note: there are spurious characters in your message (see e.g. the
> > paragraph quoted just above) that make it difficult to read.]
>
> I had well format my mail in my mail box, it may been changed by the Mail 
> List service.
>
> I will try various kinds of mail editor. It will helpful if you told me which 
> mail editor is work well with the ML.

It's probably an encoding thing (setting it to "UTF-8" should be
fine).

Best regards,
Gilles

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Fwd: [math] Discuss: New feature MiniBatchKMeansClusterer

2020-02-18 Thread Gilles Sadowski

Hello.

Le mar. 18 févr. 2020 à 04:49, 陈 涛  a écrit :
>
> Hi Gilles:
>
>I really do not know if anyone received my last mail, no one replay me for 
> a long time so I send it again and copy to you with another email system.

Sorry for the delay. :-}

>
> > Some remarks:
>
> > * I didn't get why the "KMeansPlusPlusCentroidInitializer" class
> > does not call the existing "KMeansPlusPlusClusterer".
> > Code seems duplicated: As there is a case for reuse, the currently
> > "private" centroid initialization code should be factored out.
>
> This is alpha version for discuss the "MiniBatchKMeansClusterer" algorithm,

I guess that you mean that we discuss your implementation of the
algorithm referenced in the Javadoc.

> and when "centroidOf" is extracted from "KMeansPlusPlusClusterer",
>
> the "KMeansPlusPlusClusterer" is not "KMeansPlusPlusClusterer" anymore but 
> "KMeansClusterer",

I don't follow.

>
> this is a significant change, so I did not reactor it.

Significant changes are welcome (since the next release will contain
other major changes anyways) if they improve the code base (like e.g.
reducing code duplication).

>
>
>
> > * In "CentroidInitializer", I'd rename "chooseCentroids" to emphasize
> > that a computation is performed (as opposed to selecting from an
> > existing data structure).

I think I'd prefer "selectCentroids".

>
> It is extract from "KMeansPlusPlusClusterer.centroidOf", should remain be 
> "centroidOf"?

I don't understand.

It would be easier if you create a pull request, so that we can clearly see
what codes are added/removed/changed.

>
> The subclass "RandomCentroidInitializer" and 
> "KMeansPlusPlusCentroidInitializer" indicate the algorithm used.
>
>
> > * Not convinced that there should be so many constructors (in most
> > cases, it makes no sense to pick default values that are likely to
> > be heavily application-dependent.
>
> I can add more constructors.

No, the constructors with default values clutter the API, for
no obvious gain IMHO.
[If the default values make sense, they must be documented.]

>
> I'd like a builder class more than constructors, but does not meet the 
> historical code style.

Now is the time for improving the API.
It would be quite helpful to create a report on JIRA with "sub-tasks"
for all such API proposed changes.

> > * Input should generally be validated: e.g. the maximum number of
> > iterations should not be changed unwittingly; rather, an exception
> > should be raised if the user passed a negative value.
>
> Thanks for your advices, I will improve these.
>
> > Could be nice to illustrate (not just with a picture, but in a table
> > with entries average over several runs) the differences in result
> > between the implementations, using various setups (number of
> > clusters, stopping criterion, etc.).
>
> I will make more tests, include benchmarks.
>
> It is a challenge for me to generate the various kinds of test data,
>
> could !nybody supply me the test data of this comparsion: 
> http://commons.apache.org/proper/commons-math/userguide/ml.html

They are generated programmatically; code is here:

https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/userguide/java/org/apache/commons/math4/userguide

[I've just updated the codes so that they compiles and run using
the new dependencies (see the "README" file).]

> > "MT_64" is probably not the best default.  And this is one of the
> > parameters from which there should not be a default IMO.
>
~ I will do more tests

You don't need to test the generators; users should choose
by themselves (from those provided in "Commons RNG").

>
> > [Note: there are spurious characters in your message (see e.g. the
> > paragraph quoted just above) that make it difficult to read.]
>
> I had well format my mail in my mail box, it may been changed by the Mail 
> List service.
>
> I will try various kinds of mail editor. It will helpful if you told me which 
> mail editor is work well with the ML.

It's probably an encoding thing (setting it to "UTF-8" should be
fine).

Best regards,
Gilles

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Release?

2020-02-13 Thread Gilles Sadowski

Hi.

Le jeu. 23 janv. 2020 à 15:04, Matt Juntunen
 a écrit :
>
> Hello,
>
> Any chance we can get a release (beta or full) for commons-numbers?

Non-exhaustive check-list is here:
https://issues.apache.org/jira/browse/NUMBERS-25

All "crosses" to be changed to "checks"?

Anyways, +1 to making beta releases, even while still working
on features that we want in version 1.0 "non-beta".

> As I mentioned in another thread, commons-geometry is ready for a
> beta release

+1
And while doing so, it would be nice to prepare comparison
benchmarks with CM (v3.6.1).

> but we need commons-numbers to be released before we can do that.

Indeed.

Gilles

>
> Regards,
> Matt J

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Do we want to apply for "mentored" contributions?

2020-02-05 Thread Gilles Sadowski

Hi.

Le mar. 4 févr. 2020 à 15:08, Matt Sicker  a écrit :
>
> I’d honestly expect that several components here are prime candidates for a
> more student-heavy audience,

So a related thought to clarify is whether they are welcome here,
in practice.  Indeed, through last year's GSoC experience, we could
assert that goodwill and openness was not enough to reach the target.

> particularly the more academic-aligned
> components like the math ones in particular.

Not sure that they are better candidates than any other "Commons"
project.  Last year, a lot of time (in retrospect, far, far, too much)
was devoted to "chatting" about the tools (IDE, git), and to try and
enforce good programming practice (coding style, design, unit
testing).  [I mean: things that are common to all projects.]
Whereas actual code production has not matched expectation.

> The components are also low
> level enough to not require experience in any specific frameworks which is
> nice.

I thought that too; however, as mentioned above, not sufficient.
Or more precisely we need ways to assert that mentors don't spend
their time rewriting tutorials about setting up one's environment.

> I think the difficult part is simply curating enough starter tasks
> for one or more applicants to complete in order to choose an intern.

Indeed.   And it was my impression (perhaps wrong), that such
"starter" tasks are in fact easier to find within the non math-related
components.  Hence the suggestion that internship must be supported
by "Commons" as a whole.

Regards,
Gilles

>
> On Tue, Feb 4, 2020 at 05:47 Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Is "Commons" willing to set up itself for welcoming new
> > people who, in order to contribute to the projects, might
> > need more support than the usual asynchronous review
> > of patches?
> >
> > The ASF participates in GSoC[1] and Outreachy[2] and
> > some Apache projects seem well prepared for dealing with
> > the mentoring requirements and application selection process.
> >
> > Last year, we[3] participated in GSoC, with mitigated results.
> > Maybe it was partly due to the lack of experience with these
> > programs, especially on how to gauge the candidates (wrt
> > to the expected benefit for the project).
> >
> > Some people start to ask questions about their eventual
> > application.[4][5]
> > Is "Commons" too complicated for the target audience of
> > those initiatives?
> >
> > Regards,
> > Gilles
> >
> > [1] https://summerofcode.withgoogle.com/
> > [2] https://www.outreachy.org/
> > [3] Rob Tompkins, Eric Barnhill, Alex Herbert, and I.
> > [4] https://markmail.org/message/n5prdwkaukw5ji37
> > [5]
> > https://issues.apache.org/jira/browse/NUMBERS-70?focusedCommentId=17028479=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17028479
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[All] Do we want to apply for "mentored" contributions?

2020-02-04 Thread Gilles Sadowski

Hello.

Is "Commons" willing to set up itself for welcoming new
people who, in order to contribute to the projects, might
need more support than the usual asynchronous review
of patches?

The ASF participates in GSoC[1] and Outreachy[2] and
some Apache projects seem well prepared for dealing with
the mentoring requirements and application selection process.

Last year, we[3] participated in GSoC, with mitigated results.
Maybe it was partly due to the lack of experience with these
programs, especially on how to gauge the candidates (wrt
to the expected benefit for the project).

Some people start to ask questions about their eventual
application.[4][5]
Is "Commons" too complicated for the target audience of
those initiatives?

Regards,
Gilles

[1] https://summerofcode.withgoogle.com/
[2] https://www.outreachy.org/
[3] Rob Tompkins, Eric Barnhill, Alex Herbert, and I.
[4] https://markmail.org/message/n5prdwkaukw5ji37
[5] 
https://issues.apache.org/jira/browse/NUMBERS-70?focusedCommentId=17028479=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17028479

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] NUMBERS-40: Exception Consistency

2020-02-02 Thread Gilles Sadowski

Hi.

2020-01-26 16:54 UTC+01:00, Matt Juntunen :
> Hello,
>
> I'm looking into NUMBERS-40, which suggests that the exception behavior of
> commons-numbers (specifically the gamma package) needs to be made more
> consistent. Below is a summary of the public exception types explicitly
> thrown by each module.
>
> arrays
> IndexOutOfBoundsException
> IllegalArgumentException
>
> combinatorics
> IllegalArgumentException
> NoSuchElementException
> UnsupportedOperationException
>
> complex
> NumberFormatException
> IllegalArgumentException
>
> complex-streams
> IllegalArgumentException
>
> core
> ArithmeticException
> IllegalArgumentException
>
> fraction
> ArithmeticException
> IllegalArgumentException
>
> gamma
> IllegalArgumentException

Some methods throw "ArithmeticException" while others throw
"IllegalArgumentException".  IIRC, my issue was whether there
were cases where the behaviour is not consistent from a
user POV (IOW, where the same behaviour would be expected).

[Note: I did not review all the packages.  But I don't recall there
were this kind of issue.]

Gilles

>
> primes
> IllegalArgumentException
>
> quaternion
> NumberFormatException
> IllegalArgumentException
> IllegalStateException
>
> rootfinder
> IllegalArgumentException
>
>
> Nothing in this list strikes me as being inconsistent. The types are all
> standard JDK exception types and seem to be used appropriately, IMO. Is
> there any work that needs to be done on this issue?
>
> Regards,
> Matt J
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [LANG] Start contributing

2020-01-29 Thread Gilles Sadowski

Hello.

Le mer. 29 janv. 2020 à 03:49, Asanka Amarasinghe
 a écrit :
>
> Hi,
>
> Can anyone guide me ?

If you are interested in
   https://issues.apache.org/jira/projects/LANG/issues/LANG-1499
you could maybe provide your suggestion on how to resolve
the issue in a comment over there.

HTH,
Gilles

>
> On Sun, Jan 26, 2020, 7:33 PM Asanka Amarasinghe 
> wrote:
>
> > Hi,
> >
> > I started with  LANG-1499  (
> > https://issues.apache.org/jira/projects/LANG/issues/LANG-1499 ) . I'm
> > looking for a mentor to work with, get to know the rules and practices
> > around. Below are my findings on LANG-1499.
> >
> >
> >- In the current implementation, if given *LHS* and *RHS* objects are
> >not in the same hierarchy (Parent-Child classes) then
> >*EqualsBuilder.reflectionEquals* method will always return false
> >without further validations.
> >- In documentation (
> >
> > https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/builder/EqualsBuilder.html)
> >it says " *This method uses reflection to determine if the two Object**s
> >are equal.*"  That does not describe more on how they are checked for
> >equality.
> >- As per the example in the LANG-1499 it implies any two Instances
> >with same *fields* and having same *values for those fields* should be
> >return true. ( Since at the end every Class is extends by Object class ).
> >
> > Please let me know if you can mentor me this.
> >
> >
> > *Best Regards **Asanka Amarasinghe*
> >
> >
> > On Sun, Jan 26, 2020 at 2:26 PM Asanka Amarasinghe <
> > ahamarasin...@gmail.com> wrote:
> >
> >> Hi Gilles,
> >>
> >> Thanks for the support. I was able to setup work space and build the
> >> project
> >>
> >> *Best Regards *
> >> *Asanka Amarasinghe*
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Jan 25, 2020 at 6:21 PM Gilles Sadowski 
> >> wrote:
> >>
> >>> Hi.
> >>>
> >>> 2020-01-25 7:00 UTC+01:00, Asanka Amarasinghe :
> >>> > Hi,
> >>> >
> >>> > I'm new to open source community, and I would like to contribute to
> >>> > commons.lang project. I read all the materials for beginners and I
> >>> already
> >>> > joined JIRA issue tracker.
> >>>
> >>> Welcome.
> >>>
> >>> >
> >>> > Could someone guide me to where I can find a documentation for work
> >>> space
> >>> > setup for this project? I believe if I could go through the code then
> >>> I can
> >>> > come up with a proposal for an issue listed on the tracker.
> >>>
> >>> The command for downloading the code is provided at that page:
> >>> http://commons.apache.org/proper/commons-lang/scm.html
> >>>
> >>> >
> >>> > Pardon if this is a dummy query.
> >>>
> >>> It's not.  Don't hesitate to ask if you find that some
> >>> documentation for beginners is difficult to find, unclear,
> >>> or outdated.
> >>>
> >>> Regards,
> >>> Gilles
> >>>
> >>> >
> >>> > *Best Regards **Asanka Amarasinghe*
> >>> >
> >>>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Sonarcloud reports zero coverage

2020-01-27 Thread Gilles Sadowski

Hi.

Le lun. 27 janv. 2020 à 09:56, Amey Jadiye  a écrit :
>
> On Mon, Jan 27, 2020, 4:57 AM Gilles Sadowski  wrote:
>
> > Hello.
> >
> Hi,
>
> >
> > Le dim. 26 janv. 2020 à 18:06, Amey Jadiye  a écrit
> > :
> > >
> > > For almost all the repo[1][2] this is suddenly dropped I can see an event
> > > in coverage activity written as "Quality Profile: Changes in 'Sonar way'
> > > (Java)".
> > > I see there are some changes done on profile[3] on 7th Jan, It must be
> > that
> > > change broke down the coverage. ?
> > >
> > > [1]
> > >
> > https://sonarcloud.io/project/activity?custom_metrics=coverage=custom=commons-numbers
> > > [2]
> > >
> > https://sonarcloud.io/project/activity?custom_metrics=coverage=custom=commons-rng
> > > [3]
> > >
> > https://sonarcloud.io/organizations/apache/quality_profiles/changelog?language=java=Sonar+way
> >
> > Thanks for looking into it.
> > I too had noticed that "something" is reported as changed, but
> > couldn't figure out what...
> >
> > Regards,
> > Gilles
> >
>
> That "something" is also given in the [3] link I provided.

Sorry, but I don't see/understand how the mentioned changes would
necessarily lead to coverage being set to zero...

> :), Anyway,
> infra admin can revert/correct it as i see that change is breaking many
> other than commons apache projects.

If the issue requires the intervention from INFRA, please comment
on the correpsonding issue (e.g. listing other projects that fail due
to the same change, and how to correct it).

Thanks,
Gilles

>
> Regards,
> Amey
>
> >
> > > Regards,
> > > Amey
> > >
> > > On Wed, Jan 15, 2020 at 9:17 PM Gilles Sadowski 
> > > wrote:
> > >
> > > > Hello.
> > > >
> > > > "Sonar" reports are created for several projects, a.o.
> > > > https://sonarcloud.io/dashboard?id=commons-numbers
> > > > https://sonarcloud.io/dashboard?id=commons-geometry
> > > > https://sonarcloud.io/dashboard?id=commons-rng
> > > > https://sonarcloud.io/dashboard?id=commons-statistics
> > > > for which coverage is now reported as 0%, although it was reported
> > > > correctly earlier.
> > > >
> > > > Regards,
> > > > Gilles
> > > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Sonarcloud reports zero coverage

2020-01-26 Thread Gilles Sadowski

Hello.

Le dim. 26 janv. 2020 à 18:06, Amey Jadiye  a écrit :
>
> For almost all the repo[1][2] this is suddenly dropped I can see an event
> in coverage activity written as "Quality Profile: Changes in 'Sonar way'
> (Java)".
> I see there are some changes done on profile[3] on 7th Jan, It must be that
> change broke down the coverage. ?
>
> [1]
> https://sonarcloud.io/project/activity?custom_metrics=coverage=custom=commons-numbers
> [2]
> https://sonarcloud.io/project/activity?custom_metrics=coverage=custom=commons-rng
> [3]
> https://sonarcloud.io/organizations/apache/quality_profiles/changelog?language=java=Sonar+way

Thanks for looking into it.
I too had noticed that "something" is reported as changed, but
couldn't figure out what...

Regards,
Gilles

> Regards,
> Amey
>
> On Wed, Jan 15, 2020 at 9:17 PM Gilles Sadowski 
> wrote:
>
> > Hello.
> >
> > "Sonar" reports are created for several projects, a.o.
> > https://sonarcloud.io/dashboard?id=commons-numbers
> > https://sonarcloud.io/dashboard?id=commons-geometry
> > https://sonarcloud.io/dashboard?id=commons-rng
> > https://sonarcloud.io/dashboard?id=commons-statistics
> > for which coverage is now reported as 0%, although it was reported
> > correctly earlier.
> >
> > Regards,
> > Gilles
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [All] Sonarcloud reports zero coverage

2020-01-26 Thread Gilles Sadowski

Hi.

Le dim. 26 janv. 2020 à 17:05, Matt Juntunen
 a écrit :
>
> Any word on this?

I've reported this to INFRA:
https://issues.apache.org/jira/browse/INFRA-19763

See comment there: Whoever is "admin", please follow up on their
inquiry.

Thanks,
Gilles

>
> -Matt
> ____
> From: Gilles Sadowski 
> Sent: Wednesday, January 15, 2020 10:47 AM
> To: Commons Developers List 
> Subject: [All] Sonarcloud reports zero coverage
>
> Hello.
>
> "Sonar" reports are created for several projects, a.o.
> https://sonarcloud.io/dashboard?id=commons-numbers
> https://sonarcloud.io/dashboard?id=commons-geometry
> https://sonarcloud.io/dashboard?id=commons-rng
> https://sonarcloud.io/dashboard?id=commons-statistics
> for which coverage is now reported as 0%, although it was reported
> correctly earlier.
>
> Regards,
> Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: Working on a Stream-based Java statistical processing library

2020-01-25 Thread Gilles Sadowski

Hello.

Le sam. 25 janv. 2020 à 17:56, Kartik Ohri  a écrit :
>
> Hi, I am interested in working on a Stream-based Java statistical
> processing library

Thanks for your interest in contributing.

> as described here at
> https://issues.apache.org/jira/browse/STATISTICS-7 . Can someone point
> me to how I can get started ?

That depends on what you mean.
The source code is here:
https://gitbox.apache.org/repos/asf?p=commons-statistics.git;a=tree

The idea is to add a maven module for each type of functionality implemented
in the "org.apache.commons.math4.stat" package of "Commons Math" project:

https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/main/java/org/apache/commons/math4/stat
First guess would be to have a module for each sub-package, i.e.
 * commons-statistics-correlation
 * commons-statistics-descriptive
 * commons-statistics-inference
 * commons-statistics-interval
 * commons-statistics-ranking
 * commons-statistics-regression

You would start with the most bottom-level functionality (i.e. the
packages/modules that do not depend on any other).

Proposed design should be discussed on this mailing list.
Then, steps of the implementation should have a corresponding
ticket on the project's issue tracking JIRA project:
https://issues.apache.org/jira/browse/STATISTICS

Note: The "Commons Math" JIRA project:
https://issues.apache.org/jira/browse/MATH
contains identified issues with the "stat" package, so the refactoring
is more work than porting the classes.

If you have other questions, do not hesitate to ask.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [LANG] Start contributing

2020-01-25 Thread Gilles Sadowski

Hi.

2020-01-25 7:00 UTC+01:00, Asanka Amarasinghe :
> Hi,
>
> I'm new to open source community, and I would like to contribute to
> commons.lang project. I read all the materials for beginners and I already
> joined JIRA issue tracker.

Welcome.

>
> Could someone guide me to where I can find a documentation for work space
> setup for this project? I believe if I could go through the code then I can
> come up with a proposal for an issue listed on the tracker.

The command for downloading the code is provided at that page:
http://commons.apache.org/proper/commons-lang/scm.html

>
> Pardon if this is a dummy query.

It's not.  Don't hesitate to ask if you find that some
documentation for beginners is difficult to find, unclear,
or outdated.

Regards,
Gilles

>
> *Best Regards **Asanka Amarasinghe*
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex to use code ported with a permissive licence

2020-01-24 Thread Gilles Sadowski

Hi.

Le ven. 24 janv. 2020 à 13:49, Xu Jin  a écrit :
>
> I mean create a non-apache project (named tempProject0 for example), and 
> which contains the original license from that code(named license0 for 
> example) and the modified source code of that function.

Is that really easier?

> and apache-commons-numbers just depend on project tempProject0,

Given that "Commons Numbers" should is a "zero dependency" project...

> and USE functions in tempProject0 , but does not contain any code related to 
> that source.
> in this way apache-commons-numbers is just a user of tempProject0, thus does 
> not need to add license0 into apache-commons-numbers's license.

But there is no problem adding "licence0" in the "LICENSE" file (if
the license is acceptable for an ASF project, as is the case here).

Regards,
Gilles

> Only tempProject0 has to make its license be license0 , but that will not 
> pollute apache-commons-numbers.
> although I'm not sure if this way be morals to do so, it does not break any 
> rules IMO.
> 
> 发件人: Gilles Sadowski 
> 发送时间: 2020年1月24日 19:30
> 收件人: Commons Developers List 
> 主题: Re: [numbers] Complex to use code ported with a permissive licence
>
> Le ven. 24 janv. 2020 à 10:47, Xeno Amess  a écrit :
> >
> > the new project just use the original license.
>
> What new project are you talking about?
> If it's an Apache project, it must use the Apache license; if it's not,
> we'd be in exactly the same position (i.e. determine how to comply
> with that license in an Apache project).
>
> Or I must be missing something...
>
> Gilles
>
> >
> > 
> > From: Gilles Sadowski 
> > Sent: Friday, January 24, 2020 5:43:49 PM
> > To: Commons Developers List 
> > Subject: Re: [numbers] Complex to use code ported with a permissive licence
> >
> > Hi.
> >
> > 2020-01-24 5:21 UTC+01:00, Xeno Amess :
> > > how about create a new project who contains the modified source function
> > > and original license，and in commons we just invoke that function？
> > >
> >
> > Then, the same question(s) will be asked for that new project (?).
> >
> > Regards,
> > Gilles
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex to use code ported with a permissive licence

2020-01-24 Thread Gilles Sadowski

Le ven. 24 janv. 2020 à 10:47, Xeno Amess  a écrit :
>
> the new project just use the original license.

What new project are you talking about?
If it's an Apache project, it must use the Apache license; if it's not,
we'd be in exactly the same position (i.e. determine how to comply
with that license in an Apache project).

Or I must be missing something...

Gilles

>
> ____
> From: Gilles Sadowski 
> Sent: Friday, January 24, 2020 5:43:49 PM
> To: Commons Developers List 
> Subject: Re: [numbers] Complex to use code ported with a permissive licence
>
> Hi.
>
> 2020-01-24 5:21 UTC+01:00, Xeno Amess :
> > how about create a new project who contains the modified source function
> > and original license，and in commons we just invoke that function？
> >
>
> Then, the same question(s) will be asked for that new project (?).
>
> Regards,
> Gilles
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex to use code ported with a permissive licence

2020-01-24 Thread Gilles Sadowski

Hi.

2020-01-24 5:21 UTC+01:00, Xeno Amess :
> how about create a new project who contains the modified source function
> and original license，and in commons we just invoke that function？
>

Then, the same question(s) will be asked for that new project (?).

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex to use code ported with a permissive licence

2020-01-24 Thread Gilles Sadowski

Hi.

2020-01-24 1:42 UTC+01:00, Alex Herbert :
>
>
>> On 24 Jan 2020, at 00:07, Gilles Sadowski  wrote:
>>
>> Hello.
>>
>> 2020-01-24 0:30 UTC+01:00, Alex Herbert > <mailto:alex.d.herb...@gmail.com>>:
>>> In short:
>>>
>>> - Math.hypot is required in Complex to compute sqrt(x^2 + y^2) without
>>> over/underflow to 1 ULP precision.
>>> - Complex also requires the same computation without the sqrt or
>>> over/underflow protection
>>> - I found the reference for Math.hypot and reimplemented the function
>>> - My port is 4x faster than Math.hypot and the same accuracy (or better)
>>> - I will add this as a private internal method in Complex
>>> - The source for the port requires the original licence is maintained
>>>
>>> The question is where to put the notice of the original license? Copying
>>> from commons RNG it would be in the source method javadoc and also in
>>> all
>>> LICENSE.txt files through the multi-module project. This seems excessive.
>>> I
>>> thought perhaps to include it only in numbers parent and then the
>>> complex
>>> module where it applies.
>>
>> IIUC, this would agree with the recommendations here:
>>http://www.apache.org/dev/licensing-howto.html
>> <http://www.apache.org/dev/licensing-howto.html>
>>
>
> That talks about LICENCE in the entire product distribution. So that would
> be the archive available from the downloads page. Then it states the LICENSE
> should be at the top level of the source tree. So parent has a LICENSE that
> covers all the additions for each sub-module.

That is how I understand it too.

>
> It is not clear to me about whether all modules require the same licence. It
> states
>
> "LICENSE and NOTICE must exactly represent the contents of the distribution
> they reside in.”
>
> So I would assume each module should have a LICENSE and NOTICE that reflects
> its distribution which is the module jar distributed to maven.

I'd think so.

>
> So my interpretation is to have a LICENSE and NOTICE in each module where
> LICENSE can reflect the specific additions for that module. The parent root
> directory would combine all the additions from each LICENSE in all the
> sub-modules.

There is this note:
http://www.apache.org/legal/src-headers.html#header-existingcopyright

I'd assume that using a file from that code repository is something
that already occurred within Apache...

>
> This would mean the LICENSE addition for MersenneTwister can be dropped from
> all the commons modules except the commons-rng-core module.

What about module "core" being a required dependency for
module "simple"?

> Likewise I would
> add the short license for fdlibm to numbers in the parent dir and only in
> the complex module.

Sure (since no other module depends on module "complex").

>
> Please let me know if you interpret this differently.

Hopefully, someone reading here can provide some argument.
If not, better to post to the "legal" ML:
   http://www.apache.org/foundation/mailinglists.html#foundation-legal

>
>> [By the way, we should perhaps remove the ".txt" suffix.]
>
> Although the guidelines talk about LICENSE and NOTICE all the commons
> projects I checked have a .txt suffix. The commons-build plugin that
> generates the README.md file even adds links to LICENSE.txt.

If it's automated, it's just a matter of updating to the *preferred*
policy.

>
> Looking at other apache projects they show both forms, e.g.
>
> LICENSE
> beam, spark, tomcat
>
> LICENSE.txt
> cassandra, arrow, jackrabbit-oak
>
> So I don’t think changing the .txt suffix is required.

Not required, merely suggested (based on the reading mentioned
above).

> We are in the same
> boat as a lot of other projects and at least this is consistent across all
> of commons.

Those files are mandated by the ASF, not "Commons".

Best,
Gilles

>>
>> Regards,
>> Gilles
>>
>>>
>>>
>>> Background
>>>
>>> The complex class uses the Math.hypot(double, double) function to
>>> determine
>>> the absolute value of a complex number x + iy as sqrt(x^2 + y^2) without
>>> over/underflow to 1 ULP precision. This is used directly (in sqrt() and
>>> abs()) but also without the square root to compute x^2 + y^2 in the
>>> log()
>>> function. These functions also perform over/underflow protection and so
>>> ideally just require access to the same formula for high precision x^2 +
>>> y^2. This would enable consistency across the different method

Re: [numbers] Complex to use code ported with a permissive licence

2020-01-23 Thread Gilles Sadowski

Hello.

2020-01-24 0:30 UTC+01:00, Alex Herbert :
> In short:
>
> - Math.hypot is required in Complex to compute sqrt(x^2 + y^2) without
> over/underflow to 1 ULP precision.
> - Complex also requires the same computation without the sqrt or
> over/underflow protection
> - I found the reference for Math.hypot and reimplemented the function
> - My port is 4x faster than Math.hypot and the same accuracy (or better)
> - I will add this as a private internal method in Complex
> - The source for the port requires the original licence is maintained
>
> The question is where to put the notice of the original license? Copying
> from commons RNG it would be in the source method javadoc and also in all
> LICENSE.txt files through the multi-module project. This seems excessive. I
> thought perhaps to include it only in numbers parent and then the complex
> module where it applies.

IIUC, this would agree with the recommendations here:
http://www.apache.org/dev/licensing-howto.html

[By the way, we should perhaps remove the ".txt" suffix.]

Regards,
Gilles

>
>
> Background
>
> The complex class uses the Math.hypot(double, double) function to determine
> the absolute value of a complex number x + iy as sqrt(x^2 + y^2) without
> over/underflow to 1 ULP precision. This is used directly (in sqrt() and
> abs()) but also without the square root to compute x^2 + y^2 in the log()
> function. These functions also perform over/underflow protection and so
> ideally just require access to the same formula for high precision x^2 +
> y^2. This would enable consistency across the different methods that use the
> absolute of the complex number. Currently the hypot function is very slow in
> the Complex JMH benchmark so I looked into hypot.
>
> This function is known to be slow [1] pre-Java 9 which I was using for
> benchmarking. I found that in Java 9 the code was changed from calling a
> native function to an implementation in Java of the "Freely Distributable
> Maths Library" [2]. The JMH benchmark for complex shows an improvement
> between Java 8 and 9 of about 7-fold speed increase. However this does not
> allow access to the same computation without the square root. The source
> code for fdlibm has a permission license [3] so I have implemented a port
> that allows separation of the x^2 + y^2 computation from the sqrt and the
> overflow protection.
>
> In testing my ported version I found cases where it was more accurate than
> the Java reference, but none where it was less accurate. I attribute this to
> the different implementation of splitting a number into parts for high
> precision that is different in my port from the original. I used the split
> that is already present in Complex. I tested side-by-side an alternative
> that was closer to the method from fdlibm and it was a bit slower (in Java)
> and the same accuracy as the JDK reference. So I assume that the JDK
> reference has stuck exactly to the fdlibm code. I also found my port to be
> 4x faster than the Java reference. This may require more investigation but
> for now I would like to put my port into Complex as an internal method. Note
> the method is different from the commons FastMath.hypot implementation which
> does not compute the result to 1 ULP. I will add this to the JMH benchmark
> for reference so we have Math.hypot, FastMath.hypot and the hypot method
> within Complex.
>
> The licence for fdlibm is shown in [3]. This states that code can be
> copied/modified as long as the original notice is maintained. In commons RNG
> the licence for the port of the MersenneTwister is placed in the Java source
> file and in all LICENSE.txt files through the multi-module project. So
> should I do the same for numbers or just put the license into the complex
> module?
>
>
> [1]
> https://stackoverflow.com/questions/3764978/why-hypot-function-is-so-slow
> <https://stackoverflow.com/questions/3764978/why-hypot-function-is-so-slow>
>
> [2] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7130085
> <https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7130085>
>
> [3] https://www.netlib.org/fdlibm/e_hypot.c
> <https://www.netlib.org/fdlibm/e_hypot.c>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Release?

2020-01-23 Thread Gilles Sadowski

Hi.

2020-01-23 21:50 UTC+01:00, Matt Juntunen :
> Hello,
>
>> Currently, only one unresolved issue tagged for the first release.
>
> I had a look at it (NUMBERS-40) and it looks like all of the changes listed
> in the PR for the issue [1] are in place,

The report asked for reviewing a potential inconsistency in the
usage of exceptions; but PR #6 just changed the type thrown;
see my comment at the time:
https://issues.apache.org/jira/browse/NUMBERS-40?focusedCommentId=16041712=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16041712

> even though the PR was closed over
> 2 years ago. Is this issue still valid?

Yes.
IOW, the question, and report, is about whether usage is consistent
across all of "Commons Numbers".

Regards,
Gilles

>
> Regards,
> Matt
>
> [1] https://github.com/apache/commons-numbers/pull/6
> 
> From: Gilles Sadowski 
> Sent: Thursday, January 23, 2020 11:11 AM
> To: Commons Developers List 
> Subject: Re: [numbers] Release?
>
> Hi.
>
> Le jeu. 23 janv. 2020 à 15:04, Matt Juntunen
>  a écrit :
>>
>> Hello,
>>
>> Any chance we can get a release (beta or full) for commons-numbers?
>
> Currently, only one unresolved issue tagged for the first release.[1]
>
> Regards,
> Gilles
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20NUMBERS%20AND%20fixVersion%20%3D%201.0%20AND%20statusCategory%20%3D%20new
>
>> As I mentioned in another thread, commons-geometry is ready for a beta
>> release but we need commons-numbers to be released before we can do that.
>>
>> Regards,
>> Matt J
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Release?

2020-01-23 Thread Gilles Sadowski

Hi.

Le jeu. 23 janv. 2020 à 15:04, Matt Juntunen
 a écrit :
>
> Hello,
>
> Any chance we can get a release (beta or full) for commons-numbers?

Currently, only one unresolved issue tagged for the first release.[1]

Regards,
Gilles

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20NUMBERS%20AND%20fixVersion%20%3D%201.0%20AND%20statusCategory%20%3D%20new

> As I mentioned in another thread, commons-geometry is ready for a beta 
> release but we need commons-numbers to be released before we can do that.
>
> Regards,
> Matt J

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Beta Release

2020-01-21 Thread Gilles Sadowski

Hello.

Le mar. 21 janv. 2020 à 18:28, Rob Tompkins  a écrit :
>
>
>
> > On Jan 21, 2020, at 12:24 PM, Matt Juntunen  
> > wrote:
> >
> > 
> >>
> >> Do we agree that compatibility can be broken
> >> under package "org.apache.commons.beta"?
>
> My impression was that BC can be broken moving from beta to first major 
> release, regardless.

The question was about breaking compatibility in successive *beta*
releases, i.e.
no package name change between versions, say, "beta-0.1" and "beta-0.2".

> But, changing the package name certainly makes the change contentious.

I don't understand.

>
> -Rob
>
> >
> > I believe so. If not, there wouldn't be much of a reason to do a beta 
> > version in the first place; we would just release 1.0 and then quickly jump 
> > to 2.0 (if needed).
> >
> >> Note that "Numbers" should be released first (eventually "beta" too).
> >
> > Is commons-numbers ready for a beta release as well?

I think so (this should be discussed in another post).

Regards,
Gilles

> >
> > -Matt
> > 
> > From: Gilles Sadowski 
> > Sent: Tuesday, January 21, 2020 12:11 PM
> > To: Commons Developers List 
> > Subject: Re: [geometry] Beta Release
> >
> > Hello.
> >
> >> Le mar. 21 janv. 2020 à 14:41, Matt Juntunen
> >>  a écrit :
> >>
> >> Hello,
> >>
> >> I think that commons-geometry is ready for a beta release, as discussed 
> >> previously. If there are no objections, how do we make that happen?
> >
> > Note that "Numbers" should be released first (eventually "beta" too).
> >
> > Best,
> > Gilles
> >
> >>
> >> Regards,
> >> Matt J
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Beta Release

2020-01-21 Thread Gilles Sadowski

Hello.

Le mar. 21 janv. 2020 à 14:41, Matt Juntunen
 a écrit :
>
> Hello,
>
> I think that commons-geometry is ready for a beta release, as discussed 
> previously. If there are no objections, how do we make that happen?

Note that "Numbers" should be released first (eventually "beta" too).

Best,
Gilles

>
> Regards,
> Matt J

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Beta Release

2020-01-21 Thread Gilles Sadowski

Hi.

Le mar. 21 janv. 2020 à 14:47, Rob Tompkins  a écrit :
>
> I can try to help with that later this week potentially. My main question is: 
> do we want to change the base package name across the component to 
> `org.apache.commons.beta` for the beta release?

+1

Follow-up question: Do we agree that compatibility can be broken
under package "org.apache.commons.beta"?

Gilles

>
> -Rob
>
> > On Jan 21, 2020, at 8:41 AM, Matt Juntunen  
> > wrote:
> >
> > Hello,
> >
> > I think that commons-geometry is ready for a beta release, as discussed 
> > previously. If there are no objections, how do we make that happen?
> >
> > Regards,
> > Matt J

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-21 Thread Gilles Sadowski

Hello.

2020-01-20 20:39 UTC+01:00, Matt Juntunen :
> Gilles,
>
>> I was not indicating that the name "EuclideanTransform" would be
>> better than "AffineTransform", I was wondering about whether the
>> class itself is redundant.
>
> Oh, I misunderstood. The "EuclideanTransform" interface is important because
> it adds the "applyVector(Vector)" method, which has different behavior than
> the standard "apply" method. All transforms in the euclidean packages have
> this method but it is not present in the core Transform because not all
> spaces have associated Vector types (eg, spherical). I had renamed it
> AffineTransform in the previous PR not because it exposed new functionality
> or behavior that made it affine, but because it was located in a module
> defining an affine space. What do you suggest for the name here?

I guess that "EuclideanTransform" is fine (as an extension of the
functionality not the requirements of being "affine").

Regards,
Gilles

>
> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-20 Thread Gilles Sadowski

Hello.

Le lun. 20 janv. 2020 à 16:57, Matt Juntunen
 a écrit :
>
> Gilles,
>
> > From a design POV, I still think that "AffineTransform" is redundant:
>
> The "AffineTransform" name change has been reverted. It is now the original 
> "EuclideanTransform".

I was not indicating that the name "EuclideanTransform" would be
better than "AffineTransform", I was wondering about whether the
class itself is redundant.

> I've closed PR #58 and created PR #59 with the latest commits squashed.

I've not looked yet.  But answering below, to hopefully clarify
the misunderstanding.

> > IIUC, the required (not just "desired") properties should stand out.
> > And, for the mathematically-inclined, the relationship to affine
> > transforms would illustrate it (for Euclidean spaces).
>
> I'm not sure what you're saying here.

My understanding is that "Transform" can be documented as:
---CUT---
In Euclidean space, this must be an affine transform.
---CUT---

Gilles

> The current documentation is the most complete and mathematically accurate.
>
> -Matt
>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-20 Thread Gilles Sadowski

Hello.

2020-01-20 14:28 UTC+01:00, Matt Juntunen :
> Gilles,
>
>> I had a (quick) look; is it necessary to split functionality among
>> "Transform"
>> (in "core") and its subinterfaces/classes in other modules?  IOW, if
>> "Transform"
>> can only be affine, it looks strange to have "AffineTransform"
>> (re)defined.
>
> This is a documentation issue.
> The name "affine transform" only applies to
> affine spaces such as Euclidean space. Spherical space is not an affine
> space. The "Transform" interface is intended to represent transforms with
> the desired properties regardless of whether the space is affine or not.

>From a design POV, I still think that "AffineTransform" is redundant:
 * If "Transform" has the "desired properties"
 * Then, in an affine space, "Transform" is an affine transform.

> This was not clear in the docs since the word "affine" is listed as an
> implementation requirement on the "Transform" interface. I've updated the
> docs and userguide to clarify this.

IIUC, the required (not just "desired") properties should stand out.
And, for the mathematically-inclined, the relationship to affine
transforms would illustrate it (for Euclidean spaces).

>
>
>> I'm also a bit puzzled by the "AbstractAffineTransformMatrix" that seems
>> to
>> only contain convenience methods for internal use (whereas having them
>> "protected" put them in the public API).
>
> That class also contains other matrix-specific methods (eg, "determinant")
> and the overridden "preservesOrientation". Good point on the protected
> methods, though. I've moved them into the internal "Matrices" utility
> class.

Thanks.

Gilles

>
> -Matt
> 
> From: Gilles Sadowski 
> Sent: Sunday, January 19, 2020 9:06 AM
> To: Commons Developers List 
> Subject: Re: [geometry] Rename Transform to AffineTransform
>
> Hi.
>
> Le sam. 18 janv. 2020 à 23:14, Matt Juntunen
>  a écrit :
>>
>> Gilles,
>>
>> >> There, we can simply sample the user-defined function
>> > I'm not sure I understand.
>>
>> Just an implementation detail. We need to pass some sample points through
>> the user-defined function in order to construct an equivalent matrix.
>>
>> > Throwing an exception if the transform does not abide by
>> > the requirements?
>>
>> Yes.
>>
>> I just submitted a PR on Github with these changes. I also realized that
>> the EuclideanTransform class as discussed exactly matches the definition
>> of an affine transform so I renamed it to AffineTransform. No other names
>> were changed.
>
> I had a (quick) look; is it necessary to split functionality among
> "Transform"
> (in "core") and its subinterfaces/classes in other modules?  IOW, if
> "Transform"
> can only be affine, it looks strange to have "AffineTransform" (re)defined.
>
> I'm also a bit puzzled by the "AbstractAffineTransformMatrix" that seems to
> only contain convenience methods for internal use (whereas having them
> "protected" put them in the public API).
>
> Regards,
> Gilles
>
>>
>> -Matt
>> 
>> From: Gilles Sadowski 
>> Sent: Saturday, January 18, 2020 1:40 PM
>> To: Commons Developers List 
>> Subject: Re: [geometry] Rename Transform to AffineTransform
>>
>> Hello.
>>
>> 2020-01-18 15:40 UTC+01:00, Matt Juntunen :
>> > Gilles,
>> >
>> >> If the "Transform" is intimately related to the "core" and there is a
>> >> single
>> >> set of properties that make it "affine" (and work correctly), I'd tend
>> >> to
>> >> keep the name "Transform".
>> >
>> > So, if I'm understanding you correctly, you're saying that since the
>> > partitioning code in the library only works with these types of
>> > parallelism-preserving transforms, it can be safely assumed that
>> > o.a.c.geometry.core.Transform represents such a transform. Is this
>> > correct?
>>
>> Indeed.
>>
>> > One thing that's causing some issues with the implementation here is
>> > that
>> > the Euclidean TransformXD interfaces have static
>> > "from(UnaryOperator)"
>> > methods that allow users to wrap their own, arbitrary vector operations
>> > as
>> > Transform instances. We

Re: [math]New feature MiniBatchKMeansClusterer

2020-01-20 Thread Gilles Sadowski

Hi.

2020-01-20 3:08 UTC+01:00, CT :
> Hi, In my picture search project, I need a cluster algorithm to narrow
> the dataset, for accelerate the search on millions of pictures.
>  First we use python+pytorch+kmean, with the growing data from
> thousands to millions, the KMeans clustering became slower and
> slower(seconds to minutes), then we find MiniBatchKMeans could amazing
> finish the clustering in 1~2 seconds on millions of data.

Impressive improvement.

>  Meanwhile we still faced the insufficient concurrent capacity of
> python, so we switch to kotlin on jvm.
>  But there did not a MinibatchKMeans algorithm in jvm yet, so I wrote
> one in kotlin, refer to the (python)sklearn MinibatchKMeans and Apache
> Commons Math(Deeplearning4j was also considered, but it is too slow because
> of ND4j's design).
>
>
>  I'd like to contribute it to Apache Commons Math,

Thanks!

> and I wrote a java
> version:
> https://github.com/chentao106/commons-math/tree/feature-MiniBatchKMeans

Some remarks:

* I didn't get why the "KMeansPlusPlusCentroidInitializer" class
does not call the existing "KMeansPlusPlusClusterer".
Code seems duplicated: As there is a case for reuse, the currently
"private" centroid initialization code should be factored out.

* In "CentroidInitializer", I'd rename "chooseCentroids" to emphasize
that a computation is performed (as opposed to selecting from an
existing data structure).

* Not convinced that there should be so many constructors (in most
cases, it makes no sense to pick default values that are likely to
be heavily application-dependent.

* Input should generally be validated: e.g. the maximum number of
iterations should not be changed unwittingly; rather, an exception
should be raised if the user passed a negative value.

>
>  From my test(Kotlin version), it is very fast, but gives slightly
> different results withKMeans++ in most case, but sometimes has big
> different(May be affected by the randomness of the mini batch):

Could be nice to illustrate (not just with a picture, but in a table
with entries average over several runs) the differences in result
between the implementations, using various setups (number of
clusters, stopping criterion, etc.).

>
>
>
> Some bad case:
>
> It even worse when I use RandomSource.create(RandomSource.MT_64, 0)for
> the random generator ┐(´-｀)┌.

"MT_64" is probably not the best default.  And this is one of the
parameters from which there should not be a default IMO.

[Note: there are spurious characters in your message (see e.g. the
paragraph quoted just above) that make it difficult to read.]

Best regards,
Gilles

>
>
> My brief understanding of MiniBatchKMeans:
> Use a partial points in initialize cluster centers, and random mini batch in
> training iterations.
> It can finish in few seconds when clustering millions of data, and has few
> differences between KMeans.
>
>
> More information about MiniBatchKMeans
>  https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf
> 
> https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-19 Thread Gilles Sadowski

Hi.

Le sam. 18 janv. 2020 à 23:14, Matt Juntunen
 a écrit :
>
> Gilles,
>
> >> There, we can simply sample the user-defined function
> > I'm not sure I understand.
>
> Just an implementation detail. We need to pass some sample points through the 
> user-defined function in order to construct an equivalent matrix.
>
> > Throwing an exception if the transform does not abide by
> > the requirements?
>
> Yes.
>
> I just submitted a PR on Github with these changes. I also realized that the 
> EuclideanTransform class as discussed exactly matches the definition of an 
> affine transform so I renamed it to AffineTransform. No other names were 
> changed.

I had a (quick) look; is it necessary to split functionality among "Transform"
(in "core") and its subinterfaces/classes in other modules?  IOW, if "Transform"
can only be affine, it looks strange to have "AffineTransform" (re)defined.

I'm also a bit puzzled by the "AbstractAffineTransformMatrix" that seems to
only contain convenience methods for internal use (whereas having them
"protected" put them in the public API).

Regards,
Gilles

>
> -Matt
> 
> From: Gilles Sadowski 
> Sent: Saturday, January 18, 2020 1:40 PM
> To: Commons Developers List 
> Subject: Re: [geometry] Rename Transform to AffineTransform
>
> Hello.
>
> 2020-01-18 15:40 UTC+01:00, Matt Juntunen :
> > Gilles,
> >
> >> If the "Transform" is intimately related to the "core" and there is a
> >> single
> >> set of properties that make it "affine" (and work correctly), I'd tend to
> >> keep the name "Transform".
> >
> > So, if I'm understanding you correctly, you're saying that since the
> > partitioning code in the library only works with these types of
> > parallelism-preserving transforms, it can be safely assumed that
> > o.a.c.geometry.core.Transform represents such a transform. Is this correct?
>
> Indeed.
>
> > One thing that's causing some issues with the implementation here is that
> > the Euclidean TransformXD interfaces have static "from(UnaryOperator)"
> > methods that allow users to wrap their own, arbitrary vector operations as
> > Transform instances. We don't (and really can't) do any validation on these
> > user-defined functions to ensure that they meet the library requirements. It
> > is therefore easy for users to pass in invalid operators. To avoid this, I'm
> > thinking of removing the TransformXD interfaces completely and moving the
> > "from(UnaryOperator)" methods into the AffineTransformMatrixXD classes.
>
> +1
> It is generally good to prevent the creation of invalid objects.
>
> > There, we can simply sample the user-defined function
>
> I'm not sure I understand.
>
> > as needed and produce
> > matrices that are guaranteed to be affine.
>
> Throwing an exception if the transform does not abide by
> the requirements?
>
> > Following the above, the class hierarchy would then be as below, which is
> > basically what it was before I added the TransformXD interfaces.
> >
> > commons-geometry-core
> >Transform
> >
> > commons-geometry-euclidean
> > EuclideanTransform extends Transform
> > AffineTransformMatrixXD implements EuclideanTransform
> > Rotation3D extends EuclideanTransform
> > QuaternionRotation implements Rotation3D
> >
> > commons-geometry-spherical
> > Transform1S implements Transform
> > Transform2S implements Transform
> >
> > WDYT?
>
> +1
>
> Best,
> Gilles
>
> >
> > -Matt
> >
> >
> > 
> > From: Gilles Sadowski 
> > Sent: Monday, January 13, 2020 8:03 PM
> > To: Commons Developers List 
> > Subject: Re: [geometry] Rename Transform to AffineTransform
> >
> > Hi.
> >
> > Le lun. 13 janv. 2020 à 04:39, Matt Juntunen
> >  a écrit :
> >>
> >> Gilles,
> >>
> >> > How about keeping "Transform" for the interface name and define a method
> >> > ... boolean isAffine();
> >>
> >> I would prefer to have separate types for each kind of transform.
> >> This would make the API clear and would avoid numerous checks in the code
> >> in order to see if a particular transform instance is supported. The
> >> transform types also generally have an "is-a" relationship with each
> >> other, which seems like a perfect fit for inheritance. [1]
> >>
> >

Re: [geometry] Rename Transform to AffineTransform

2020-01-18 Thread Gilles Sadowski

Hello.

2020-01-18 15:40 UTC+01:00, Matt Juntunen :
> Gilles,
>
>> If the "Transform" is intimately related to the "core" and there is a
>> single
>> set of properties that make it "affine" (and work correctly), I'd tend to
>> keep the name "Transform".
>
> So, if I'm understanding you correctly, you're saying that since the
> partitioning code in the library only works with these types of
> parallelism-preserving transforms, it can be safely assumed that
> o.a.c.geometry.core.Transform represents such a transform. Is this correct?

Indeed.

> One thing that's causing some issues with the implementation here is that
> the Euclidean TransformXD interfaces have static "from(UnaryOperator)"
> methods that allow users to wrap their own, arbitrary vector operations as
> Transform instances. We don't (and really can't) do any validation on these
> user-defined functions to ensure that they meet the library requirements. It
> is therefore easy for users to pass in invalid operators. To avoid this, I'm
> thinking of removing the TransformXD interfaces completely and moving the
> "from(UnaryOperator)" methods into the AffineTransformMatrixXD classes.

+1
It is generally good to prevent the creation of invalid objects.

> There, we can simply sample the user-defined function

I'm not sure I understand.

> as needed and produce
> matrices that are guaranteed to be affine.

Throwing an exception if the transform does not abide by
the requirements?

> Following the above, the class hierarchy would then be as below, which is
> basically what it was before I added the TransformXD interfaces.
>
> commons-geometry-core
>Transform
>
> commons-geometry-euclidean
> EuclideanTransform extends Transform
> AffineTransformMatrixXD implements EuclideanTransform
> Rotation3D extends EuclideanTransform
> QuaternionRotation implements Rotation3D
>
> commons-geometry-spherical
> Transform1S implements Transform
> Transform2S implements Transform
>
> WDYT?

+1

Best,
Gilles

>
> -Matt
>
>
> 
> From: Gilles Sadowski 
> Sent: Monday, January 13, 2020 8:03 PM
> To: Commons Developers List 
> Subject: Re: [geometry] Rename Transform to AffineTransform
>
> Hi.
>
> Le lun. 13 janv. 2020 à 04:39, Matt Juntunen
>  a écrit :
>>
>> Gilles,
>>
>> > How about keeping "Transform" for the interface name and define a method
>> > ... boolean isAffine();
>>
>> I would prefer to have separate types for each kind of transform.
>> This would make the API clear and would avoid numerous checks in the code
>> in order to see if a particular transform instance is supported. The
>> transform types also generally have an "is-a" relationship with each
>> other, which seems like a perfect fit for inheritance. [1]
>>
>> > I don't get that it is an "accuracy" issue. If some requirement is not
>> > met,
>> results will be plain wrong
>>
>> Yes, you are correct. I was not very clear in what I wrote. The results
>> will be completely unusable if the transform does not meet the
>> requirements.
>>
>> > I wonder why the documented requirement that an "inverse transform
>> must exist" does not translate into a method ... getInverse();
>>
>> Good point. All current implementations are able to provide an inverse so
>> that method should be present on the interface.
>>
>> In regard to renaming the Transform interface, I had another idea. The
>> main purpose of that interface is to provide a way for the partitioning
>> code in the core module to implement generic transforms of BSP trees (see
>> AbstractBSPTree.transform()). This is what leads to the requirement that
>> the transform preserve parallelism, since otherwise, the represented
>> region may be warped in such a way as to make the tree invalid. However,
>> as far as I can tell, there is not a general mathematical term for this
>> type of transform that applies to Euclidean and non-Euclidean geometries.
>> So, my thought is to move the Transform interface to the "partitioning"
>> package to indicate its relationship to those classes and simply name it
>> something descriptive like "ParallelismPreservingTransform" ("parallelism"
>> since that is the more generic, non-Euclidean form of the concept of
>> "parallel"). The Euclidean module could then provide a true
>> "AffineTransform" interface that extends "ParallelismPreservingTransform".
>> The spherical module transforms wo

Re: [rng] FiniteDoubleSampler to sample uniformly from a range of representable double values

2020-01-17 Thread Gilles Sadowski

Hi.

Le ven. 17 janv. 2020 à 13:15, Alex Herbert  a écrit :
>
> The UniformRandomProvider and ContinuousUniformSampler can create
> doubles uniformly in a range [0, 1) and [x, y).
>
> I would like to create a sampler that can create doubles with random
> mantissa bits and a specified range of exponents. These would not follow
> a standard distribution but would be distributed according to the IEEE
> 754 floating-point "double format" bit layout.
>
> The sampler would ensure all bits of the mantissa are uniformly sampled
> and the exponent range is uniformly sampled. It could be used for
> example to create random double data in a specified range for testing.

Are there use-cases other than providing inputs for unit tests?

>
> Following from UniformRandomProvider I propose to remove the sign bit to
> allow a maximum range sampler from [0, Double.MAX_VALUE].

Is there a trade-off to allowing generation of the whole range of "double"s?

> Thus we have
> the API:
>
> public final class FiniteDoubleSampler {
>
> /**
>   * Creates a new finite double sampler.
>   *
>   * This will return double values from the entire exponent range of
> a finite
>   * {@code double} including sub-normal numbers. The value will be unsigned.
>   *
>   * @param rng Generator of uniformly distributed random numbers.
>   * @return Sampler.
>   * @since 1.4
>   */
> public static SharedStateContinuousSampler of(UniformRandomProvider rng);
>
> /**
>   * Creates a new finite double sampler.
>   *
>   * This will return double values from the specified exponent range
> of a finite
>   * {@code double}. This assumes all sub-normal numbers are identified
> with the exponent -1023.
>   * The value will be unsigned.
>   *
>   * @param rng Generator of uniformly distributed random numbers.
>   * @param minExponent Minimum exponent
>   * @param maxExponent Maximum exponent
>   * @see Double#MIN_EXPONENT
>   * @see Double#MAX_EXPONENT
>   * @throws IllegalArgumentException If the exponents are not in the
> range -1023 inclusive
>   * to 1023 inclusive; or the min exponent is not {@code <=} max exponent.
>   * @return Sampler.
>   * @since 1.4
>   */
> public static SharedStateContinuousSampler of(UniformRandomProvider rng,
>int minExponent,
>int maxExponent);
> }
>
> I have written many tests where I wanted full precision random mantissas
> in double values and wanted the doubles to represent all doubles that
> could occur.

How do you ensure it (short of listing them all)?

> Thus randomly sampling from the IEEE 754 representation
> seems to be more generic than for example rng.nextDouble() * constant.

What's the advantage of the former over the latter?

> For example:
>
> // Random numbers: [0, Double.MAX_VALUE]
> FiniteDoubleSampler.of(rng);
> FiniteDoubleSampler.of(rng, -1023, 1023);
> // Random sub-normal numbers: [0, Double.MIN_NORMAL)
> FiniteDoubleSampler.of(rng, -1023, -1023);
> // Random numbers that are close to overflow: (Double.MAX_VALUE/2,
> Double.MAX_VALUE]
> FiniteDoubleSampler.of(rng, 1023, 1023);
> // Random numbers in the range [1, 32)
> FiniteDoubleSampler.of(rng, 0, 4);

Will the distribution be the same as for "32 * rng.nextDouble()"?

Regards,
Gilles

> Thoughts on this?
>
> Alex
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Math] Bypassing official channels... (Was: [GitHub] [commons-math] chentao106 opened a new pull request #117: Implement the MiniBatchKMeansClusterer)

2020-01-16 Thread Gilles Sadowski

Hi.

Can someone comment on GitHub that communication should be through
the "dev" ML and/or JIRA?

Thanks,
Gilles

Le jeu. 16 janv. 2020 à 17:37, GitBox  a écrit :
>
> chentao106 opened a new pull request #117: Implement the 
> MiniBatchKMeansClusterer
> URL: https://github.com/apache/commons-math/pull/117
>
>
>Implement the MiniBatchKMeansClusterer and unit test that compare to 
> KMeansPlusPlusClusterer
>MiniBatchKMeans is a fast clustering algorithm base on KMeans(Refer to 
> Python sklearn.cluster.MiniBatchKMeans)
>Use a partial points in initialize cluster centers, and mini batch in 
> iterations.
>It can finish in few seconds when clustering millions of data, and has few 
> differences between KMeans.
>See https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf
>
> 
> This is an automated message from the Apache Git Service.
> To respond to the message, please log on to GitHub and use the
> URL above to go to the specific comment.
>
> For queries about this service, please contact Infrastructure at:
> us...@infra.apache.org
>
>
> With regards,
> Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[All] Sonarcloud reports zero coverage

2020-01-15 Thread Gilles Sadowski

Hello.

"Sonar" reports are created for several projects, a.o.
https://sonarcloud.io/dashboard?id=commons-numbers
https://sonarcloud.io/dashboard?id=commons-geometry
https://sonarcloud.io/dashboard?id=commons-rng
https://sonarcloud.io/dashboard?id=commons-statistics
for which coverage is now reported as 0%, although it was reported
correctly earlier.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-numbers] branch master updated: Add benchmark for sin/cos computation.

2020-01-14 Thread Gilles Sadowski

Hi.

Le mar. 14 janv. 2020 à 12:24,  a écrit :
>
> [...]
>
> Add benchmark for sin/cos computation.
>
> Computing sin/cos together would improve many of the functions in
> Complex. This benchmark investigates the possibility of using the
> Commons FastMath implementation instead of java.util.Math.

Related issues:
https://issues.apache.org/jira/browse/MATH-901
https://issues.apache.org/jira/browse/MATH-1113
https://issues.apache.org/jira/browse/MATH-740

But CM does not provide sine and cosine in a single function call.

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-13 Thread Gilles Sadowski

Hi.

Le lun. 13 janv. 2020 à 04:39, Matt Juntunen
 a écrit :
>
> Gilles,
>
> > How about keeping "Transform" for the interface name and define a method 
> > ... boolean isAffine();
>
> I would prefer to have separate types for each kind of transform.
> This would make the API clear and would avoid numerous checks in the code in 
> order to see if a particular transform instance is supported. The transform 
> types also generally have an "is-a" relationship with each other, which seems 
> like a perfect fit for inheritance. [1]
>
> > I don't get that it is an "accuracy" issue. If some requirement is not met,
> results will be plain wrong
>
> Yes, you are correct. I was not very clear in what I wrote. The results will 
> be completely unusable if the transform does not meet the requirements.
>
> > I wonder why the documented requirement that an "inverse transform
> must exist" does not translate into a method ... getInverse();
>
> Good point. All current implementations are able to provide an inverse so 
> that method should be present on the interface.
>
> In regard to renaming the Transform interface, I had another idea. The main 
> purpose of that interface is to provide a way for the partitioning code in 
> the core module to implement generic transforms of BSP trees (see 
> AbstractBSPTree.transform()). This is what leads to the requirement that the 
> transform preserve parallelism, since otherwise, the represented region may 
> be warped in such a way as to make the tree invalid. However, as far as I can 
> tell, there is not a general mathematical term for this type of transform 
> that applies to Euclidean and non-Euclidean geometries. So, my thought is to 
> move the Transform interface to the "partitioning" package to indicate its 
> relationship to those classes and simply name it something descriptive like 
> "ParallelismPreservingTransform" ("parallelism" since that is the more 
> generic, non-Euclidean form of the concept of "parallel"). The Euclidean 
> module could then provide a true "AffineTransform" interface that extends 
> "ParallelismPreservingTransform". The spherical module transforms would 
> directly inherit from "ParallelismPreservingTransform" and thus avoid any 
> incorrect usage of the term "affine". The class hierarchy would then look 
> like this:
>
> commons-geometry-core
>ParallelismPreservingTransform
>
> commons-geometry-euclidean
> AffineTransform extends ParallelismPreservingTransform
> AffineTransformXD extends AffineTransform
> AffineTransformMatrixXD implements AffineTransformXD
> Rotation3D extends AffineTransform3D
> QuaternionRotation implements Rotation3D
>
> commons-geometry-spherical
> Transform1S implements ParallelismPreservingTransform
> Transform2S implements ParallelismPreservingTransform
>
> I think the type names here are much more descriptive and mathematically 
> accurate. WDYT?

I'm not convinced that such a hierarchy would enhance clarity.
If the "Transform" is intimately related to the "core" and there is a single
set of properties that make it "affine" (and work correctly), I'd tend to
keep the name "Transform".  [As long as unit tests ensure that concrete
implementations possess the expected properties, we are safe.]

Regards,
Gilles

> -Matt
>
>
> [1] https://en.wikipedia.org/wiki/Geometric_transformation
>
> 
> From: Gilles Sadowski 
> Sent: Wednesday, January 8, 2020 8:16 AM
> To: Commons Developers List 
> Subject: Re: [geometry] Rename Transform to AffineTransform
>
> Hi.
>
> Le mer. 8 janv. 2020 à 04:39, Matt Juntunen
>  a écrit :
> >
> > Gilles,
> >
> > > I thought that the question was how to replace "transform"...
> >
> > I should probably clarify. I want to change the name of the Transform class 
> > to make it clear that it only represents transforms that preserve 
> > parallelism (eg, affine transforms). The most obvious name would be 
> > AffineTransform
>
> How about keeping "Transform" for the interface name and define a method
> ---CUT---
> /**
>  * Move here the doc explaining under what conditions this method can
> return "true".
>  */
> boolean isAffine();
> ---CUT---
> ?
>
> Gilles
>
> > like I suggested but I want to make sure that no one objects to this for 
> > design or mathematical reasons.
> >
> > > Anyways, what would be the issue(s) of a non-affine transform?
> > > IOW why should implementations of "Transf

Re: [numbers] LinearCombination dot product accuracy

2020-01-10 Thread Gilles Sadowski

Hi.

Le ven. 10 janv. 2020 à 22:45, Alex Herbert  a écrit :
>
>
>
> > On 10 Jan 2020, at 18:12, Gilles Sadowski  wrote:
> >
> >> ...
> >
> > IIUC, precision (without resorting to "BigDecimal") was the purpose of
> > "LinearCombination".  +1 for making the appropriate changes to the
> > existing "value" method.
> > [I'd suggest to open a JIRA issue and mention the rest of the alternatives
> > there, for future reference; but I wouldn't add them to the API until there
> > is a need for it.]
>
> OK. If the purpose is precision and to avoid BigDecimal then switching to 
> dot3 seems the best option as it performs one pass over the round-off errors 
> to improve the summation in the case of badly conditioned dot products. This 
> is an alternative to sorting the input products before (and during) summing 
> and faster.
>
> I’ll write the code to allow expansion to dot4, dot5, etc but not add this to 
> the API.
>
> Previously I used the timings from the summation only for the 2.2x slower 
> estimate. The timings for the dot product are better. Dot2 to dot3 should be 
> 12.5 to 18.5 so about 50% slower (table 6.5 in Ogita’s paper).
>
> I am going to set up a JMH module for numbers as I would like to benchmark 
> the operations in Complex. This is for reference with regard to the idea to 
> create the previously discussed ComplexList that stores the numbers as 
> primitive arrays and then does operations by creating instances of Complex 
> and then returning the results to the primitive arrays. I hope to show that 
> for most operations the overhead of creating Complex objects is negligible. 
> There may be cases where special implementations would be useful such as 
> multiply or divide of two lists in a paired operation.
>
> I was going to use the format from [rng] for the JMH module thus:
>
> commons-numbers/commons-numbers-examples/examples-jmh
>
> However I do not have any other examples.

One never knows.  Perhaps others will come up.
E.g. usage of "Complex" in order to call JTransform...

Gilles

> So perhaps just a single child module that is included via a profile as it 
> need not be a distributed artifact:
>
> commons-numbers/commons-numbers-mmh
>
> WDYT?
>
> I can build the new LinearCombination methods and put the old ones in the jmh 
> project with a test of relative speed.
>
> Then I’ll get back to Complex (which is nearing completion) and the 
> ComplexList.
>
> Alex
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] LinearCombination dot product accuracy

2020-01-10 Thread Gilles Sadowski

etter here.
>
> There is a MatLab script in Ogita et al to generate vectors for 
> multiplication. This may be worth investigating. For now I would not 
> recommend switching to Dekker’s dot product unless there is evidence it is 
> generally applicable.
>
> 5. Add methods value3(…) that use 3-fold precision and unroll the loops for 
> small vectors.
>
>
> I think that option 2 is preferred as all split multiply methods should use 
> the correct split from Dekker.
>
> I would vote for option 3 to add a higher precision version if a user 
> desires. Since this may be 2x slower it may not be the best option for the 
> default.
>
> I would opt to implement option 5 and test it with JMH to see if dot3 is 2.2x 
> slower than dot2 in the JVM.
>
>
> Thus most of this can be put into a JMH module and tested against the current 
> class to see the performance effect.
>
> The type of changes that may end up in LinearCombination comes down to 
> whether you want precision over speed.

IIUC, precision (without resorting to "BigDecimal") was the purpose of
"LinearCombination".  +1 for making the appropriate changes to the
existing "value" method.
[I'd suggest to open a JIRA issue and mention the rest of the alternatives
there, for future reference; but I wouldn't add them to the API until there
is a need for it.]

Thanks,
Gilles

>
>
> Opinions welcome.
>
> Alex
>
>
> [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547 
> <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547>
> [2] https://doi.org/10.1007/BF01397083 <https://doi.org/10.1007/BF01397083>
> [3] 
> http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps
>  
> <http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps>
> [4] 
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements 
> <https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [collections] bloom filters comments

2020-01-08 Thread Gilles Sadowski

Le mer. 8 janv. 2020 à 15:15, Gary Gregory  a écrit :
>
> I think it is time to bring this PR in and make any adjustments within
> master beyond that. This will be quicker and simpler than going round and
> round for simple things like Javadoc tweaks and small non-functional
> changes (formatting, variable names, and so on.) I'll proceed with that
> tonight.

Design issues were raised on the ML: With no agreement and no opinions
other than Claude's and mine, things stayed where they were.

Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-08 Thread Gilles Sadowski

Hi.

Le mer. 8 janv. 2020 à 04:39, Matt Juntunen
 a écrit :
>
> Gilles,
>
> > I thought that the question was how to replace "transform"...
>
> I should probably clarify. I want to change the name of the Transform class 
> to make it clear that it only represents transforms that preserve parallelism 
> (eg, affine transforms). The most obvious name would be AffineTransform

How about keeping "Transform" for the interface name and define a method
---CUT---
/**
 * Move here the doc explaining under what conditions this method can
return "true".
 */
boolean isAffine();
---CUT---
?

Gilles

> like I suggested but I want to make sure that no one objects to this for 
> design or mathematical reasons.
>
> > Anyways, what would be the issue(s) of a non-affine transform?
> > IOW why should implementations of "Transfrom" be restricted to affine
> > transform?
>
> Instances of Transform are used to transform hyperplanes and 
> hyperplane-bounded regions. If the transform is not affine, then the computed 
> results will not be accurate.

I don't get that it is an "accuracy" issue. If some requirement is not met,
results will be plain wrong; so it depends on usage: when the transform
must be affine, the code being passed that instance should be able to
check whether it can use it for the intended purpose.

I wonder why the documented requirement that an "inverse transform
must exist" does not translate into a method
---CUT---
Transform getInverse();
---CUT---

Regards,
Gilles

> -Matt
> 
> From: Gilles Sadowski 
> Sent: Tuesday, January 7, 2020 6:41 PM
> To: Commons Developers List 
> Subject: Re: [geometry] Rename Transform to AffineTransform
>
> Le mar. 7 janv. 2020 à 17:55, Matt Juntunen
>  a écrit :
> >
> > Gilles,
> >
> > > "AffineMap" (?)
> >
> > I think any name that doesn't include the word "transform" somehow would 
> > probably be confusing.
>
> This is supposed to be a synonym.[1]
> I thought that the question was how to replace "transform"...
>
> >
> > > Was the same "Transform" interface used in both the "euclidean" and the
> > "spherical" packages of "Commons Math"?
> >
> > Indirectly. SphericalPolygonsSet extended AbstractRegion, which included an 
> > applyTransform(Transform) method.
>
> So the "affine" requirement (in the doc) applied there too.
>
> Anyways, what would be the issue(s) of a non-affine transform?
> IOW why should implementations of "Transfrom" be restricted to affine
> transform?
>
> Regards,
> Gilles
>
> [1] https://en.wikipedia.org/wiki/Affine_transformation
>
> > -Matt
> > 
> > From: Gilles Sadowski 
> > Sent: Tuesday, January 7, 2020 10:29 AM
> > To: Commons Developers List 
> > Subject: Re: [geometry] Rename Transform to AffineTransform
> >
> > Hello.
> >
> > Le mar. 7 janv. 2020 à 16:00, Matt Juntunen
> >  a écrit :
> > >
> > > Hi all,
> > >
> > > The documentation for the o.a.c.geometry.core.Transform interface (which 
> > > comes from the original commons-math version) states that implementations 
> > > must be affine. Therefore, I think we should rename this interface to 
> > > AffineTransform to avoid confusion with other types of transforms, such 
> > > as projective transforms. Potential problems with this are
> > > - the fact that the JDK already has a class with the same name 
> > > (java.awt.geom.AffineTransform), and
> >
> > "AffineMap" (?)
> >
> > > - I'm not sure if the term "affine" can technically be applied to 
> > > non-Euclidean geometries, such as spherical geometry (there may be 
> > > nuances there that I'm not aware of).
> >
> > Was the same "Transform" interface used in both the "euclidean" and the
> > "spherical" packages of "Commons Math"?
> >
> > Regards,
> > Gilles
> >
> > > Anyone have any insight or opinions on this? I've created GEOMETRY-80 to 
> > > track this issue.
> > >
> > > Regards,
> > > Matt

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-07 Thread Gilles Sadowski

Le mar. 7 janv. 2020 à 17:55, Matt Juntunen
 a écrit :
>
> Gilles,
>
> > "AffineMap" (?)
>
> I think any name that doesn't include the word "transform" somehow would 
> probably be confusing.

This is supposed to be a synonym.[1]
I thought that the question was how to replace "transform"...

>
> > Was the same "Transform" interface used in both the "euclidean" and the
> "spherical" packages of "Commons Math"?
>
> Indirectly. SphericalPolygonsSet extended AbstractRegion, which included an 
> applyTransform(Transform) method.

So the "affine" requirement (in the doc) applied there too.

Anyways, what would be the issue(s) of a non-affine transform?
IOW why should implementations of "Transfrom" be restricted to affine
transform?

Regards,
Gilles

[1] https://en.wikipedia.org/wiki/Affine_transformation

> -Matt
> 
> From: Gilles Sadowski 
> Sent: Tuesday, January 7, 2020 10:29 AM
> To: Commons Developers List 
> Subject: Re: [geometry] Rename Transform to AffineTransform
>
> Hello.
>
> Le mar. 7 janv. 2020 à 16:00, Matt Juntunen
>  a écrit :
> >
> > Hi all,
> >
> > The documentation for the o.a.c.geometry.core.Transform interface (which 
> > comes from the original commons-math version) states that implementations 
> > must be affine. Therefore, I think we should rename this interface to 
> > AffineTransform to avoid confusion with other types of transforms, such as 
> > projective transforms. Potential problems with this are
> > - the fact that the JDK already has a class with the same name 
> > (java.awt.geom.AffineTransform), and
>
> "AffineMap" (?)
>
> > - I'm not sure if the term "affine" can technically be applied to 
> > non-Euclidean geometries, such as spherical geometry (there may be nuances 
> > there that I'm not aware of).
>
> Was the same "Transform" interface used in both the "euclidean" and the
> "spherical" packages of "Commons Math"?
>
> Regards,
> Gilles
>
> > Anyone have any insight or opinions on this? I've created GEOMETRY-80 to 
> > track this issue.
> >
> > Regards,
> > Matt

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] Rename Transform to AffineTransform

2020-01-07 Thread Gilles Sadowski

Hello.

Le mar. 7 janv. 2020 à 16:00, Matt Juntunen
 a écrit :
>
> Hi all,
>
> The documentation for the o.a.c.geometry.core.Transform interface (which 
> comes from the original commons-math version) states that implementations 
> must be affine. Therefore, I think we should rename this interface to 
> AffineTransform to avoid confusion with other types of transforms, such as 
> projective transforms. Potential problems with this are
> - the fact that the JDK already has a class with the same name 
> (java.awt.geom.AffineTransform), and

"AffineMap" (?)

> - I'm not sure if the term "affine" can technically be applied to 
> non-Euclidean geometries, such as spherical geometry (there may be nuances 
> there that I'm not aware of).

Was the same "Transform" interface used in both the "euclidean" and the
"spherical" packages of "Commons Math"?

Regards,
Gilles

> Anyone have any insight or opinions on this? I've created GEOMETRY-80 to 
> track this issue.
>
> Regards,
> Matt

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Fwd: [Geometry] Class "Equivalency"

2020-01-04 Thread Gilles Sadowski

Forwarding to ML.

Gilles

P.S. Please don't send to me directly when the post is meant for
the list, as otherwise hitting "Reply" will move the conversation
off-list...

-- Forwarded message -
De : Gilles Sadowski 
Date: sam. 4 janv. 2020 à 19:54
Subject: Re: [Geometry] Class "Equivalency"
To: Matt Juntunen 


Hello.

Le sam. 4 janv. 2020 à 17:30, Matt Juntunen
 a écrit :
>
> Gilles,
>
> I removed the Equivalency interface and pushed the changes to my current PR, 
> along with some changes related to the shape-generation classes (GEOMETRY-83).
>
> In regard to renaming "eq" to "equals", I personally prefer "eq" (short for 
> "equivalent") to make it more distinct from "equals".

I thought that it was short for "equals", so this is indeed confusing
and the method should actually be named "equivalent".

> The "eq" method does not satisfy the strict requirements for "equals", namely 
> transitivity and the relation to hashCode. (This is also why I don't think 
> "equals" should internally call "eq".)

OK.

> Having "eq" be an overload of "equals" would to me imply that they have the 
> same general properties, which is not the case.

Not so sure; but I don't have any strong argument at hand. :-)

However, are there unit tests and/or examples of use-cases
from which newbies can grasp when to use which?

Thanks,
Gilles

>
> -Matt
> 
> From: Gilles Sadowski 
> Sent: Saturday, January 4, 2020 6:26 AM
> To: Commons Developers List 
> Subject: Re: [Geometry] Class "Equivalency"
>
> Hello.
>
> 2020-01-04 6:10 UTC+01:00, Matt Juntunen :
> > Gilles,
> >
> > I took another look at the code and it turns out we can easily remove the
> > entire Equivalency interface and just use methods of the form "eq(T,
> > DoublePrecisionContext)", exactly the same way that the VectorXD classes do
> > it now. This way, it's a little more clear what precision is being used for
> > the comparison and we don't have an extra interface floating around. The
> > original behavior can also be replicated with just "obj.eq(other,
> > obj.getPrecision())".
> >
> > WDYT?
>
> +1
>
> For further consolidation, could we rename "eq" to "equals", and
> make "equals(Object)" call "equals(T, DoublePrecisionContext)"?
>
> Gilles
>
> >
> > -Matt
> > 
> > From: Gilles Sadowski 
> > Sent: Friday, January 3, 2020 8:12 PM
> > To: Commons Developers List 
> > Subject: Re: [Geometry] Class "Equivalency"
> >
> > Hi.
> >
> > 2020-01-03 22:02 UTC+01:00, Matt Juntunen :
> >> Gilles,
> >>
> >> Yes, users are responsible for handling their own PrecisionContexts. The
> >> idea behind the Equivalency interface was to provide an easy way to
> >> perform
> >> "fuzzy" comparisons between objects. The interface itself does not define
> >> what the comparison involves. Classes that implement the interface (such
> >> as
> >> Line and Plane) decide what "fuzzy" means in their particular case. All
> >> of
> >> the current implementations of this interface contain a requirement that
> >> PrecisionContexts used by the compared objects (and used in floating
> >> point
> >> comparisons in the eq() method itself) be exactly equal in order for the
> >> objects to be considered equivalent. This is done to make the operation
> >> commutative, so that if "a.eq(b)" is true, then "b.eq(a)" is also true.
> >> This
> >> is documented on each implementing class.
> >
> > My impression is that it is more akin to "strict" equality rather
> > than fuzzy.  Looking at the code, e.g. for "Line3D", it doesn't
> > strike as obvious what the different use-cases are for "equals"
> > and "eq".
> > In effect, I could imagine that "fuzzy" equality could not be
> > commutative (random, perhaps naive, thought): an instance with
> > low precision would indicate that some result (where precision
> > could have been lost) would consider itself equal to another
> > instance (that may have have been set with higher precision).
> >
> > Gilles
> >
> >>
> >> -Matt
> >> 
> >> From: Gilles Sadowski 
> >> Sent: Friday, January 3, 2020 12:56 PM
> >> To: Commons Developers List 
>

Re: [Geometry] Class "Equivalency"

2020-01-04 Thread Gilles Sadowski

Hello.

2020-01-04 6:10 UTC+01:00, Matt Juntunen :
> Gilles,
>
> I took another look at the code and it turns out we can easily remove the
> entire Equivalency interface and just use methods of the form "eq(T,
> DoublePrecisionContext)", exactly the same way that the VectorXD classes do
> it now. This way, it's a little more clear what precision is being used for
> the comparison and we don't have an extra interface floating around. The
> original behavior can also be replicated with just "obj.eq(other,
> obj.getPrecision())".
>
> WDYT?

+1

For further consolidation, could we rename "eq" to "equals", and
make "equals(Object)" call "equals(T, DoublePrecisionContext)"?

Gilles

>
> -Matt
> 
> From: Gilles Sadowski 
> Sent: Friday, January 3, 2020 8:12 PM
> To: Commons Developers List 
> Subject: Re: [Geometry] Class "Equivalency"
>
> Hi.
>
> 2020-01-03 22:02 UTC+01:00, Matt Juntunen :
>> Gilles,
>>
>> Yes, users are responsible for handling their own PrecisionContexts. The
>> idea behind the Equivalency interface was to provide an easy way to
>> perform
>> "fuzzy" comparisons between objects. The interface itself does not define
>> what the comparison involves. Classes that implement the interface (such
>> as
>> Line and Plane) decide what "fuzzy" means in their particular case. All
>> of
>> the current implementations of this interface contain a requirement that
>> PrecisionContexts used by the compared objects (and used in floating
>> point
>> comparisons in the eq() method itself) be exactly equal in order for the
>> objects to be considered equivalent. This is done to make the operation
>> commutative, so that if "a.eq(b)" is true, then "b.eq(a)" is also true.
>> This
>> is documented on each implementing class.
>
> My impression is that it is more akin to "strict" equality rather
> than fuzzy.  Looking at the code, e.g. for "Line3D", it doesn't
> strike as obvious what the different use-cases are for "equals"
> and "eq".
> In effect, I could imagine that "fuzzy" equality could not be
> commutative (random, perhaps naive, thought): an instance with
> low precision would indicate that some result (where precision
> could have been lost) would consider itself equal to another
> instance (that may have have been set with higher precision).
>
> Gilles
>
>>
>> -Matt
>> 
>> From: Gilles Sadowski 
>> Sent: Friday, January 3, 2020 12:56 PM
>> To: Commons Developers List 
>> Subject: [Geometry] Class "Equivalency"
>>
>> Hello.
>>
>> I'm wary about making that class part of the public API.
>> I thought that the original idea was that users would be
>> responsible of how they handle the "PrecisionContext".
>> However, it seems that "Equivalency" requires equality
>> of "PrecisionContext" instances (not the "double" value).
>> This is non-obvious and IMHO deserves some explanation
>> in the Javadoc and user guide.
>>
>> Regards,
>> Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Geometry] Class "Equivalency"

2020-01-03 Thread Gilles Sadowski

Hi.

2020-01-03 22:02 UTC+01:00, Matt Juntunen :
> Gilles,
>
> Yes, users are responsible for handling their own PrecisionContexts. The
> idea behind the Equivalency interface was to provide an easy way to perform
> "fuzzy" comparisons between objects. The interface itself does not define
> what the comparison involves. Classes that implement the interface (such as
> Line and Plane) decide what "fuzzy" means in their particular case. All of
> the current implementations of this interface contain a requirement that
> PrecisionContexts used by the compared objects (and used in floating point
> comparisons in the eq() method itself) be exactly equal in order for the
> objects to be considered equivalent. This is done to make the operation
> commutative, so that if "a.eq(b)" is true, then "b.eq(a)" is also true. This
> is documented on each implementing class.

My impression is that it is more akin to "strict" equality rather
than fuzzy.  Looking at the code, e.g. for "Line3D", it doesn't
strike as obvious what the different use-cases are for "equals"
and "eq".
In effect, I could imagine that "fuzzy" equality could not be
commutative (random, perhaps naive, thought): an instance with
low precision would indicate that some result (where precision
could have been lost) would consider itself equal to another
instance (that may have have been set with higher precision).

Gilles

>
> -Matt
> 
> From: Gilles Sadowski 
> Sent: Friday, January 3, 2020 12:56 PM
> To: Commons Developers List 
> Subject: [Geometry] Class "Equivalency"
>
> Hello.
>
> I'm wary about making that class part of the public API.
> I thought that the original idea was that users would be
> responsible of how they handle the "PrecisionContext".
> However, it seems that "Equivalency" requires equality
> of "PrecisionContext" instances (not the "double" value).
> This is non-obvious and IMHO deserves some explanation
> in the Javadoc and user guide.
>
> Regards,
> Gilles
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Geometry] Class "Equivalency"

2020-01-03 Thread Gilles Sadowski

Hello.

I'm wary about making that class part of the public API.
I thought that the original idea was that users would be
responsible of how they handle the "PrecisionContext".
However, it seems that "Equivalency" requires equality
of "PrecisionContext" instances (not the "double" value).
This is non-obvious and IMHO deserves some explanation
in the Javadoc and user guide.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex missing some C++ standards

2019-12-29 Thread Gilles Sadowski

Hi.

Le dim. 29 déc. 2019 à 23:25, Alex Herbert  a écrit :
>
> I’ve dropped the static equals methods and reciprocal and pushed the updated 
> class with MathJax.
>
> I put MathJax in whenever possible. This may be a bit too much. The rendered 
> javadoc looks good but the javadoc rendered by my IDE without MathJax support 
> can be very unreadable.

One cannot have one's cake and eat it too.

>
> Have a look at the built javadocs and through an IDE and let me know your 
> opinions on the current usage.

I think that it's fine, when looking with a pager.
I don't use an IDE. :-}

>
> It may be better to drop some use of MathJax such as \( z \) for {@code z} 
> which would make the code more readable in an IDE when programming.

Sure. [I seem to recall having mentioned that such usage of {@code} was fine.]

> I’ve not explicitly laid out the latex equations for unparsed readability so 
> some improvements could be made. However some equations are multi-line which 
> gets wrapped to a single line if pure HTML without MathJax. For example see 
> atanh and acos. I do not know how to lay this out to make it readable without 
> MathJax.

No need IMO.

> So perhaps a note in the class javadoc that the use of MathJax is required to 
> read the formatted equations.

This a general comment (i.e. valid for all classes in this component).
But isn't MathJax active (when browsing the "Commons" site?

> On another c++ note the documentation for the C99 functions on C++ reference 
> lists the special return values, e.g. [1]. This is similar to the special 
> return values listed in java.util.Math for functions, e.g. [2]. I think it 
> would be good to add these to the javadoc. It should be a simple cut and 
> reformat from the ISO C99 Annex G that the class has been tested against.

How about including a link to the respective C++ doc pages?

Regards,
Gilles

>
> [1] https://en.cppreference.com/w/c/numeric/complex/cacos 
> <https://en.cppreference.com/w/c/numeric/complex/cacos>
> [2] 
> https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#atan2-double-double-
>  
> <https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#atan2-double-double->
>
> Alex
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [collections] bloom filters comments

2019-12-29 Thread Gilles Sadowski

Le dim. 29 déc. 2019 à 15:30, Claude Warren  a écrit :
>
> If the Shape class (BloomFilter.Shape) is extracted from the BloomFilter
> interface and made a stand-alone class would the name change or is the fact
> that it is in the o.a.c.c.bloomfilter package enough?
>
> I prefer the name Shape to BloomFilterShape.

If "Shape" is only used for "BloomFilter" (as the suggestion above
seems to indicate, why not declare it as a sub-interface:
---CUT---
public interface BloomFilter {
// ...

public interface Shape {
    // ...
}
}
---CUT---
?

Regards,
Gilles

>
> Claude
>
> On Sat, Dec 28, 2019 at 9:06 AM Claude Warren  wrote:
>
> > Once the interface is extracted and reduced to the minimum necessary the
> > following methods are removed:
> >
> > orCardinality() -- we have andCardinality() and xorCardinality() this was
> > included for completeness.
> >
> > isFull() -- true if all the bits in the vector are on. A convenience
> > method used to short circuit some logic.
> >
> > I think these 2 methods should go into the BloomFilter interface.
> >
> >
> > Set operations:
> >
> > jaccardSimilarity -- a measure of similarity in sets with range [0,1]
> >
> > jaccardDistance -- defined as 1 - jaccardSimilarity
> >
> > cosineSimilarity -- a measure of similarity with range  [0,1]
> >
> > cosineDistance -- defined as 1 - cosineSimilarity
> >
> > estimateSize -- Set operation estimate of the number of items that were
> > placed in the filter. (Requires Shape)
> >
> > estimateUnionSize -- Set operation estimate of the number of items that
> > would be represented by the merging of two filters. (Requires Shape)
> >
> > estimateIntersectionSize -- Set operations estimate of the number of items
> > in the intersection. (Requires Shape)
> >
> > Perhaps it makes sense to move the Set operations into their own class
> > with static methods?  The set operations all operate on 2 Bloom filters.
> > Moving them would clarify the AbstractBloomFilter class.
> >
> > Claude
> >
> >
> > On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory 
> > wrote:
> >
> >> On Fri, Dec 27, 2019 at 11:36 AM Claude Warren  wrote:
> >>
> >> > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory 
> >> > wrote:
> >> >
> >> > > Hi Claude and all:
> >> > >
> >> > > Here are a couple of comments on the bloom filter PR.
> >> > >
> >> > > 10,100 ft level comment we do not have to worry about today: Before
> >> the
> >> > > release, we might want to split Commons Collection into a multi-module
> >> > > project and have the BF code in it own module. The _main_ reason for
> >> this
> >> > > is that it will allow the dependency on Commons Codecis be a
> >> non-optional
> >> > > dependency. Optional dependency are a a bit of a pain IMO, especially
> >> > when
> >> > > you deal with large stacks. You end up sucking in everything that is
> >> > > optional when you deliver an app because you do not know for certain
> >> > what's
> >> > > really required at runtime.
> >> > >
> >> > > Closer to the ground: I would make BloomFilter an interface and rename
> >> > > the current BloomFilter class AbstractBloomFilter implements
> >> BloomFilter.
> >> > > This will allow the user and maintainer to see what is truly public.
> >> > >
> >> >
> >> > I have done this (not checked in) but the net effect is that all the
> >> public
> >> > methods in the AbstractBloomFilter class are reflected in the
> >> BloomFilter
> >> > interface and the Shape class has moved to the Interface as well.
> >> >
> >>
> >> OK, I think I like BloomFilter as an interface especially since it is used
> >> as an argument in methods. I'll leave it to you as to whether Shape needs
> >> to be folded in. I did notice that Shape is an argument in a few places.
> >> Might we loose some focus and abstraction by folding Shape into the
> >> BloomFilter interface?
> >>
> >>
> >> > I can remove several methods from BloomFilter that are not strictly
> >> > necessary for this code to function.  I am and have been reticent to do
> >> so
> >> > since there are a number of home-grown libraries used by various
> >> > researchers that provide these functions.  Bu

Re: [NUMBERS-96][GSoC-2020] Port and redevelop interpolation libraries from commons-math

2019-12-28 Thread Gilles Sadowski

Hi.

2019-12-28 19:59 UTC+01:00, Rishabh Budhouliya :
> Hi everyone,
>
> I would like to know two things:
>
> 1) Which ported module/classes should I read and compare from commons-math
> to understand the architectural decisions taken to use lambda functions,
> streams etc all the FP paradigms in common-numbers?

The overall idea is to have a "modern" math library, i.e. follow
the evolutions of the JDK so that developers can use modern
syntax and take advantage of the new features.

>
> 2) Is this project still up for GSoC 2020?

We don't know at this point whether people from Commons will
be willing to dedicate time for this.
[However, it's reasonable to expect that GSoC candidates who'd
have accomplished prior work would be favorably considered (but
no promise!).]

> I would like to port the interpolation libraries from commons-math to
> commons-numbers

Have you considered
   https://issues.apache.org/jira/projects/NUMBERS/issues/NUMBERS-69
?

> but definitely need some guidance as I am new to
> Functional Programming.

I think that the intent was to indicate that the features
from the "java.util.function" package should be user
in place of the function object defined in Commons Math.

> I hope this is the appropriate channel to raise such requests, I apologize
> if I spammed some guys!

It's the right place.

Thanks for your interest,
Gilles

>
> Regards,
> Rishabh Budhouliya
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex missing some C++ standards

2019-12-28 Thread Gilles Sadowski

Hi.

2019-12-29 1:15 UTC+01:00, Alex Herbert :
>
>
>> On 21 Dec 2019, at 11:42, Gilles Sadowski  wrote:
>>
>>> ...
>>
>> So, would you suggest that no "Number"-like class should ever throw
>> an exception (but instead return the equivalent of "Double.NaN”)?
>
> Yes. As it was the method could throw for some invalid input and not others.
> This is inconsistent. Changing to return a NaN equivalent is more neutral.
> Through the rest of the code all the computations that may end up computing
> NaN just return NaN. There is no raising of exceptions even when the ISO C99
> standard states that an exception may be raised, i.e. these do not throw
> ArithmeticException. Removing all exceptions from the class could be stated
> as a design decision in the class javadoc. The only exceptions are from the
> parse(String) method.

+1
Could also be considered as a simplification (no wondering about
which exception raise...).
Acceptable rationale for not throwing is that caller code can always
do the checks and act accordingly (throw or propagate NaN).

>
> I have been documenting the class using MathJax. This has raised a few more
> discrepancies with the c++ standards:
>
> 1. Static constructors
>
> Complex:
>
> ofCartesian(x, y)
> ofPolar(rho, theta)
>
> No public constructor.
>
> C++
>
> complex(x, y) // This is a public class constructor for type
> 
> polar(rho, theta)
>
> Should we change to the same name,

-0

> add synonyms

-0

> just accept the API
> difference?
>
> Sticking to the VALJO design would leave it as is and document that ofPolar
> matches the functionality of the ISO C++ polar method.

+1
Priority (IMO) would be to try and establish consistency within Java.

>
>
> 2. Reciprocal
>
> I had to change the implementation of reciprocal to call divide,
> effectively:
>
> Complex.ONE.divide(this)
>
> This uses scaling for the divisor and can compute values that the previous
> version could not such as:
>
> Complex.ofCartesian(Double.MAX_VALUE, Double.MAX_VALUE).reciprocal()
>
> It also better handles NaN and infinite edge cases.

Great.

>
> This method seems redundant. The origin in CM3 was due to Complex
> implementing the FieldElement interface. This has reciprocal() as a required
> method.
>
> It also has multiply(int) hence why that method was originally in the
> numbers version of Complex. This has now been dropped as it is redundant
> with multiply(double). Since we do not have to support the FieldElement
> interface I would suggest dropping reciprocal as well.

+1

>
> 3. Equals
>
> Why are there so many static equals functions using
> o.a.c.numbers.core.Precision with ULP and deltas?
>
> boolean equals(Complex x, Complex y, int maxUlps)
>
> @return {@code true} if there are fewer than {@code maxUlps} floating
>  * point values between the real or imaginary, respectively, parts of
> {@code x}
>  * and {@code y}.
>
>
> boolean equals(Complex x, Complex y)
>
> @return equals(x, y, 1)
>
>
> boolean equals(Complex x, Complex y, double eps)
>
> @return {@code true} if the values are two adjacent floating point
>  * numbers or they are within range of each other.
>
>
> boolean equalsWithRelativeTolerance(Complex x, Complex y,  double eps)
>
> {@code true} if the values are two adjacent floating point
>  * numbers or they are within range of each other.
>
>
> These are all helper methods. None are required for the test suite. I do not
> see the need to have them in the core Complex class. The methods do not
> cover all possible ways to measure equality, just some using the Precision
> class. I would drop these and leave it to a user to decide how to measure
> equality.

+1

>
> However I do note that similar methods are in the CM3 Complex class. Leaving
> them here would make porting to numbers easy. I think that before the 1.0
> release  is the time to decide if they should be maintained in Complex, in
> another class such as ComplexPrecision for legacy support of porting from
> CM3, or dropped. In the later case a note could be added to the package
> javadoc to state how to replicate the equals functionality from CM3 using
> numbers.Precision.

Let's drop, and wait for actual use cases.

Thanks,
Gilles

>
> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex missing some C++ standards

2019-12-21 Thread Gilles Sadowski

Hi.

2019-12-21 1:07 UTC+01:00, Alex Herbert :
>
>
>> On 20 Dec 2019, at 23:45, Gilles Sadowski  wrote:
>>
>> Le sam. 21 déc. 2019 à 00:05, Alex Herbert > <mailto:alex.d.herb...@gmail.com>> a écrit :
>>>
>>> Looking at the c++ standard [1] we are missing this function:
>>>
>>> norm() = a^2 + b^2
>>>
>>> The field norm of the complex, or the square of the absolute. An example
>>> on C++ reference states that this is faster for comparing magnitudes for
>>> ranking as it avoids the sqrt() required in abs().
>>>
>>> z.abs() > y.abs()  ==  z.norm() > y.norm()
>>>
>>>
>>> I suggest this is added to comply with the standard.
>>
>> +1
>>
>>>
>>>
>>> It seems odd to me that the constructor ofPolar throws an exception. It
>>> does this when the magnitude (rho) for the complex is negative. However
>>> if the magnitude is NaN it will not throw an exception and will end up
>>> returning NaN. I think this should be changed to return NaN for negative
>>> magnitude and avoid exceptions. This is basically stating that the polar
>>> representation you used is invalid and so in the Cartesian representation
>>> it will be (nan, nan).
>>>
>>> The C++ standard on this was previously vague and was clarified in [2]:
>>>
>>> “
>>> -?- Requires: rho shall be non-negative and non-NaN. theta shall be
>>> finite.
>>>
>>> -9- Returns: The complex value corresponding to a complex number whose
>>> magnitude is rho and whose phase angle is theta.
>>> “
>>>
>>> The assumption is that abs(polar(rho, theta)) == rho.
>>>
>>> If this cannot be ensured then polar(rho, theta) is undefined and we
>>> return NaN.
>>>
>>>
>>> Note that if theta is finite and rho is non-negative and non-nan:
>>>
>>> x = rho * Math.cos(theta)
>>> y = rho * Math.sin(theta)
>>>
>>> In the event that sin(theta) is zero (i.e. theta is zero) then inf * 0 is
>>> NaN. In this case the complex could either be:
>>>
>>> (Inf, nan) or (inf, 0)
>>>
>>> I have tried 2 c++ implementations and both return (inf, nan). The c++
>>>  header I have found would return (inf, 0). The same header also
>>> corrects if cos(theta) is zero however in Java cos(pi/2) is not zero so
>>> this is not an issue.
>>>
>>> Note:
>>> If the result is (inf, nan) then abs((inf, nan)) should return inf to
>>> satisfy the contract abs(polar(rho, theta)) == rho. This is currently
>>> true as abs() uses Math.hypot(x, y) which will return positive infinity
>>> if either argument is infinite. So I do not think it matters. An infinite
>>> is infinite even when the other part is nan.
>>>
>>>
>>> I suggest we update ofPolar to not throw exceptions and return NAN unless
>>> theta is finite and rho is non-negative and non-nan.
>>
>> +0
>> Not sure how useful it is to instantiate a useless object.
>
> It would not be instantiated. It can use the singleton NaN. But I do see
> your point. Something has gone wrong so raise it as a problem.
>
> Looking at various implementations the polar(rho, theta) method is used when
> computing results in polar coordinates to then return a result. For example
> this is used in various sqrt() implementations as the equivalent of:
>
> Complex.ofPolar(Math.sqrt(z.abs()), z.arg() / 2)
>
> In this case it would not be nice to throw an exception for a complex z
> which is NaN. Instead return NaN as the sqrt of NaN.
>
> I thought it would be more in line with the c++ standard to not throw
> exceptions. I also think that returning NaN is more in line with how
> java.util.Math handles invalid cases.

So, would you suggest that no "Number"-like class should ever throw
an exception (but instead return the equivalent of "Double.NaN")?

Perhaps that's the most neutral approach (careful users will check
the return value; others will have to use a debugger in order to find
the cause of the failure).
In the absence of other arguments, it's indeed safer to adopt the
same conventions as C++.

Regards,
Gilles

>>>
>>> Alex
>>>
>>>
>>> [1] http://www.cplusplus.com/reference/complex/
>>> <http://www.cplusplus.com/reference/complex/>
>>> <http://www.cplusplus.com/reference/complex/
>>> <http://www.cplusplus.com/reference/complex/>>
>>> [2] https://cplusplus.github.io/LWG/issue2459
>>> <https://cplusplus.github.io/LWG/issue2459>
>>> <https://cplusplus.github.io/LWG/issue2459
>>> <https://cplusplus.github.io/LWG/issue2459>>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex missing some C++ standards

2019-12-20 Thread Gilles Sadowski

Le sam. 21 déc. 2019 à 00:05, Alex Herbert  a écrit :
>
> Looking at the c++ standard [1] we are missing this function:
>
> norm() = a^2 + b^2
>
> The field norm of the complex, or the square of the absolute. An example on 
> C++ reference states that this is faster for comparing magnitudes for ranking 
> as it avoids the sqrt() required in abs().
>
> z.abs() > y.abs()  ==  z.norm() > y.norm()
>
>
> I suggest this is added to comply with the standard.

+1

>
>
> It seems odd to me that the constructor ofPolar throws an exception. It does 
> this when the magnitude (rho) for the complex is negative. However if the 
> magnitude is NaN it will not throw an exception and will end up returning 
> NaN. I think this should be changed to return NaN for negative magnitude and 
> avoid exceptions. This is basically stating that the polar representation you 
> used is invalid and so in the Cartesian representation it will be (nan, nan).
>
> The C++ standard on this was previously vague and was clarified in [2]:
>
> “
> -?- Requires: rho shall be non-negative and non-NaN. theta shall be finite.
>
> -9- Returns: The complex value corresponding to a complex number whose 
> magnitude is rho and whose phase angle is theta.
> “
>
> The assumption is that abs(polar(rho, theta)) == rho.
>
> If this cannot be ensured then polar(rho, theta) is undefined and we return 
> NaN.
>
>
> Note that if theta is finite and rho is non-negative and non-nan:
>
> x = rho * Math.cos(theta)
> y = rho * Math.sin(theta)
>
> In the event that sin(theta) is zero (i.e. theta is zero) then inf * 0 is 
> NaN. In this case the complex could either be:
>
> (Inf, nan) or (inf, 0)
>
> I have tried 2 c++ implementations and both return (inf, nan). The c++ 
>  header I have found would return (inf, 0). The same header also 
> corrects if cos(theta) is zero however in Java cos(pi/2) is not zero so this 
> is not an issue.
>
> Note:
> If the result is (inf, nan) then abs((inf, nan)) should return inf to satisfy 
> the contract abs(polar(rho, theta)) == rho. This is currently true as abs() 
> uses Math.hypot(x, y) which will return positive infinity if either argument 
> is infinite. So I do not think it matters. An infinite is infinite even when 
> the other part is nan.
>
>
> I suggest we update ofPolar to not throw exceptions and return NAN unless 
> theta is finite and rho is non-negative and non-nan.

+0
Not sure how useful it is to instantiate a useless object.

Regards,
Gilles

>
> Alex
>
>
> [1] http://www.cplusplus.com/reference/complex/ 
> <http://www.cplusplus.com/reference/complex/>
> [2] https://cplusplus.github.io/LWG/issue2459 
> <https://cplusplus.github.io/LWG/issue2459>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex.ofImaginary and multiplyI

2019-12-12 Thread Gilles Sadowski

Le jeu. 12 déc. 2019 à 16:23, Alex Herbert  a écrit :
>
>
> > OK. I'll remove ofReal() and ofImaginary().
> >
> > They can always be added later.
>
> The same may apply to square():
>
> public Complex square() {
>  return multiply(this);
> }

Sure.

>
> It could be defined differently:
>
> re = ac - bd = aa - bb = (a-b)(a+b)
> im = ad + bc = ab + ba = 2 ab
>
> public Complex square() {
>  return new Complex((real-imaginary)*(real+imaginary),
> 2*real*imaginary);
> }
>
> But this does not account for the overflow protection and handling of
> special cases done in multiply. So I'd rather leave it calling
> multiply(this). So the square() method seems redundant and I would
> recommend dropping it.

+1

>
> It was originally used in computations. Now it is not.

Thanks,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex.ofImaginary and multiplyI

2019-12-12 Thread Gilles Sadowski

Le jeu. 12 déc. 2019 à 15:20, Alex Herbert  a écrit :
>
>
> On 12/12/2019 13:50, Gilles Sadowski wrote:
> > Hello.
> >
> > Le jeu. 12 déc. 2019 à 10:04, Alex Herbert  a 
> > écrit :
> >> There is a factory constructor:
> >>
> >> Complex.ofReal(double)
> >>
> >> For completeness I think we should add:
> >>
> >> Complex.ofImaginary(double)
> > I wonder whether we should remove "ofReal".
> > It's a shortcut that does not save a lot of typing, IMHO not worth adding
> > to the API.
>
> I like the ofReal() constructor. I was leaning on the other side for
> completeness. So I already added ofImaginary(). But both could be removed.
>
> It allows construction of a Complex from a double using lambda
> functions, e.g. for streams:
>
> List numbers = Arrays.stream(new double[] {0, 1, 2})
>  .mapToObj(Complex::ofImaginary)
>  .collect(Collectors.toList());
>
> It may be useful and does not add much bloat to the API.

Yet, id' rather wait until we have a compelling use-case.
It's easy to add to the API; impossible to remove from it...

The above example falls in the realm of the "ComplexList"
discussed in another thread.  And such factories will likely
be defined there, e.g.
---CUT---
double[] re = new double[] {1, 2, 3};
double[] im = new double[] {4, 5, 6};
ComplexList numbers = ComplexList.ofCartesian(re, im);
---CUT---
and
---CUT---
double[] interleaved = new double[][] {1, 4, 2, 5, 3, 6};
ComplexList numbers = ComplexList.ofCartesian(interleaved);
---CUT---
etc.

> >
> >> We also have negate() as a helper. I think the following are also useful:
> >>
> >> Complex multiplyI()
> >> Complex multiplyNegativeI()
> > Applicability (and name) is not straightforward, other than where it is
> > used internally.  I'd keep those "private".
> >
> > For the API, perhaps a more general
> > ---CUT---
> > public Complex multiplyImaginary(double im) {
> >  return ofCartesian(-im * imaginary, im * real);
> > }
> > ---CUT---
> > ?
>
> That is slated to be added for NUMBERS-139 [1].
>
> I put in the extra add and subtract functions for ISO C99 Annex G.5.2. I
> have not done the multiplication and divisions from Annex G.5.1 yet. I
> will do those soon.
>
> However unlike add/subtract I do not think that there is an issue with
> maintaining or negating the sign of 0.0 if multiplied by 1 or -1 rather
> than by leaving it as it or just inverting the sign. So perhaps the
> optimisation multiplyI() is redundant, i.e:

It is redundant with the above method; so I'd keep it private (and later
drop it, if benchmarks don't show that there is a performance improvement
over multiplication by 1).
Overall, I'd think that performance considerations should not impact on the
API for the first release.

Regards,
Gilles

>
> 1 * -0.0 = -0.0
> -1 * -0.0 = 0.0
> 1 * 0.0 = 0.0
> -1 * 0.0 = -0.0
>
> I go through the edge cases when I write the tests.
>
>
> [1]
> https://issues.apache.org/jira/projects/NUMBERS/issues/NUMBERS-139?filter=allopenissues
>
> >
> > Gilles
> >
> >> These two operations appear a lot in the formulas for the trigonomic 
> >> functions.
> >>
> >> They essentially just swap the real and imaginary parts and update the 
> >> sign appropriately.
> >>
> >> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex.ofImaginary and multiplyI

2019-12-12 Thread Gilles Sadowski

Hello.

Le jeu. 12 déc. 2019 à 10:04, Alex Herbert  a écrit :
>
> There is a factory constructor:
>
> Complex.ofReal(double)
>
> For completeness I think we should add:
>
> Complex.ofImaginary(double)

I wonder whether we should remove "ofReal".
It's a shortcut that does not save a lot of typing, IMHO not worth adding
to the API.

>
> We also have negate() as a helper. I think the following are also useful:
>
> Complex multiplyI()
> Complex multiplyNegativeI()

Applicability (and name) is not straightforward, other than where it is
used internally.  I'd keep those "private".

For the API, perhaps a more general
---CUT---
public Complex multiplyImaginary(double im) {
return ofCartesian(-im * imaginary, im * real);
}
---CUT---
?

Gilles

> These two operations appear a lot in the formulas for the trigonomic 
> functions.
>
> They essentially just swap the real and imaginary parts and update the sign 
> appropriately.
>
> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-numbers] 02/03: Added getAbsolute() to complement getArgument().

2019-12-08 Thread Gilles Sadowski

2019-12-08 18:52 UTC+01:00, Alex Herbert :
>
>
>> …
>
> Would this also apply to conjugate() and conj()?
>
> ISO C99 uses conj().

+1 (Why not?)


Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-numbers] 02/03: Added getAbsolute() to complement getArgument().

2019-12-08 Thread Gilles Sadowski

Le dim. 8 déc. 2019 à 13:32, Alex Herbert  a écrit :
>
>
>
> > On 8 Dec 2019, at 11:24, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > 2019-12-08 10:45 UTC+01:00, Alex Herbert  > <mailto:alex.d.herb...@gmail.com>>:
> >> On Sun, 8 Dec 2019, 01:50 Gilles Sadowski,  wrote:
> >>
> >>> 2019-12-08 2:00 UTC+01:00, aherb...@apache.org :
> >>>> This is an automated email from the ASF dual-hosted git repository.
> >>>>
> >>>> aherbert pushed a commit to branch master
> >>>> in repository https://gitbox.apache.org/repos/asf/commons-numbers.git
> >>>>
> >>>> commit 65f60835e2afe26bdaba9665d62edb25195bfff6
> >>>> Author: Alex Herbert 
> >>>> AuthorDate: Sun Dec 8 00:54:03 2019 +
> >>>>
> >>>>Added getAbsolute() to complement getArgument().
> >>>
> >>> Is that name part of the reference?
> >>> If not, "getMagnitude" or "getModulus" would be clearer.
> >>>
> >>
> >> There was an arg() and abs() as in C99. But also a getArgument() but not a
> >> getAbsolute().
> >>
> >> The API is still a bit inconsistent. For example there is proj() but not
> >> getProjection().
> >
> > I recall that Eric and I had a discussion (on this ML) about whether
> > to use the Java convention or copy names from another source.
> >
> >>
> >> I think we should drop getArgument() and getAbsolute() in favour of the C99
> >> arg() and abs(). The only methods with get on the front should be for the
> >> properties real and imaginary.
> >
> > IMO, that "argument" and "magnitude" are not properties is an
> > "implementation detail".

Sorry, one consideration not taken into account in the above statement:
the class "Complex", as a ValJO, might be defined as the "Cartesian
representation of a complex number".  That's also what would be deduced
from the "toString()" and "parse(String)" methods.
The fields "real" and "imaginary" are indeed "properties" and part of the API.

Hence, perhaps better to remove "getAbsolute()" and "getArgument()",
so that o.a. things the user is aware that a computation will be performed
for each call.

Regards,
Gilles

> > If we'd decide to not follow the Java convention for method names
> > (i.e. adhering to another standard is deemed better -- IIRC, Eric
> > mentioned it would be easier for developer to port codes from
> > C++), then "getReal()" and "getImaginary()" could be dropped.
> >
> > IIRC, at some point, we wondered whether to implement a low-level
> > class that could hold
> > * re
> > * im
> > and/or
> > * mag
> > * arg
> > to allow using the most favourable representation for a given
> > computation.
> > Eventually, we stuck to the simple plan (on the basis that it hasn't
> > been done that way in the C++ standard library).
>
> Perhaps on that basis we should leave the getX() methods for those values 
> that can be used to represent a Complex, implementation details hidden. Thus 
> we have these with their C99 equivalents:
>
> getReal() or real()
> getImaginary() or imag()
> getAbsolute() or abs()
> getArgument() or arg()
>
> Removing all the getX methods moves away from the Java convention for 
> properties. The C99 aliases make it easy to port code. All other methods 
> should use the C99 names.
>
> This is how the class currently stands so no changes.
>
> >
> >>
> >> Then all methods should be the C99 names.
> >
> > I'm fine with removing the aliases.
> > But then we need to have a look at the "Quaternion" class and
> > apply the same decision, for minimal API consistency between
> > implementations of related concepts.
> >
> > Best,
> > Gilles
> >
> >>>
> >>>> ---
> >>>> .../apache/commons/numbers/complex/Complex.java| 33
> >>>> ++
> >>>> .../commons/numbers/complex/ComplexTest.java   | 22
> >>>> +--
> >>>> 2 files changed, 41 insertions(+), 14 deletions(-)
> >>>>
> >>>> diff --git
> >>>>
> >>> a/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> >>>>
> >>> b/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> >>&g

Re: [commons-numbers] 02/03: Added getAbsolute() to complement getArgument().

2019-12-08 Thread Gilles Sadowski

Hi.

2019-12-08 10:45 UTC+01:00, Alex Herbert :
> On Sun, 8 Dec 2019, 01:50 Gilles Sadowski,  wrote:
>
>> 2019-12-08 2:00 UTC+01:00, aherb...@apache.org :
>> > This is an automated email from the ASF dual-hosted git repository.
>> >
>> > aherbert pushed a commit to branch master
>> > in repository https://gitbox.apache.org/repos/asf/commons-numbers.git
>> >
>> > commit 65f60835e2afe26bdaba9665d62edb25195bfff6
>> > Author: Alex Herbert 
>> > AuthorDate: Sun Dec 8 00:54:03 2019 +
>> >
>> > Added getAbsolute() to complement getArgument().
>>
>> Is that name part of the reference?
>> If not, "getMagnitude" or "getModulus" would be clearer.
>>
>
> There was an arg() and abs() as in C99. But also a getArgument() but not a
> getAbsolute().
>
> The API is still a bit inconsistent. For example there is proj() but not
> getProjection().

I recall that Eric and I had a discussion (on this ML) about whether
to use the Java convention or copy names from another source.

>
> I think we should drop getArgument() and getAbsolute() in favour of the C99
> arg() and abs(). The only methods with get on the front should be for the
> properties real and imaginary.

IMO, that "argument" and "magnitude" are not properties is an
"implementation detail".
If we'd decide to not follow the Java convention for method names
(i.e. adhering to another standard is deemed better -- IIRC, Eric
mentioned it would be easier for developer to port codes from
C++), then "getReal()" and "getImaginary()" could be dropped.

IIRC, at some point, we wondered whether to implement a low-level
class that could hold
 * re
 * im
and/or
 * mag
 * arg
to allow using the most favourable representation for a given
computation.
Eventually, we stuck to the simple plan (on the basis that it hasn't
been done that way in the C++ standard library).

>
> Then all methods should be the C99 names.

I'm fine with removing the aliases.
But then we need to have a look at the "Quaternion" class and
apply the same decision, for minimal API consistency between
implementations of related concepts.

Best,
Gilles

>>
>> > ---
>> >  .../apache/commons/numbers/complex/Complex.java| 33
>> > ++
>> >  .../commons/numbers/complex/ComplexTest.java   | 22
>> > +--
>> >  2 files changed, 41 insertions(+), 14 deletions(-)
>> >
>> > diff --git
>> >
>> a/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>> >
>> b/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>> > index bbee805..92f4c2a 100644
>> > ---
>> >
>> a/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>> > +++
>> >
>> b/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>> > @@ -284,13 +284,33 @@ public final class Complex implements
>> Serializable  {
>> >  return this;
>> >  }
>> >
>> > - /**
>> > - * Return the absolute value of this complex number.
>> > - * This code follows the > > href="http://www.iso-9899.info/wiki/The_Standard;>ISO C Standard,
>> Annex
>> > G,
>> > - * in calculating the returned value (i.e. the hypot(x,y) method)
>> > - * and in handling of NaNs.
>> > +/**
>> > + * Return the absolute value of this complex number. This is also
>> > called norm, modulus,
>> > + * or magnitude.
>> > + * abs(a + b i) = sqrt(a^2 + b^2)
>> > + *
>> > + * If either component is infinite then the result is positive
>> > infinity. If either
>> > + * component is NaN and this is not {@link #isInfinite() infinite}
>> then
>> > the result is NaN.
>> > + *
>> > + * This code follows the
>> > + * http://www.iso-9899.info/wiki/The_Standard;>ISO C
>> > Standard, Annex G,
>> > + * in calculating the returned value using the {@code hypot(x,y)}
>> > method.
>> >   *
>> >   * @return the absolute value.
>> > + * @see #isInfinite()
>> > + * @see #isNaN()
>> > + * @see Math#hypot(double, double)
>> > + */
>> > +public double getAbsolute() {
>> > +return getAbsolute(real, imaginary);
>> > +}
>> > +
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-numbers] 02/03: Added getAbsolute() to complement getArgument().

2019-12-07 Thread Gilles Sadowski

2019-12-08 2:00 UTC+01:00, aherb...@apache.org :
> This is an automated email from the ASF dual-hosted git repository.
>
> aherbert pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/commons-numbers.git
>
> commit 65f60835e2afe26bdaba9665d62edb25195bfff6
> Author: Alex Herbert 
> AuthorDate: Sun Dec 8 00:54:03 2019 +
>
> Added getAbsolute() to complement getArgument().

Is that name part of the reference?
If not, "getMagnitude" or "getModulus" would be clearer.

Gilles

> ---
>  .../apache/commons/numbers/complex/Complex.java| 33
> ++
>  .../commons/numbers/complex/ComplexTest.java   | 22 +--
>  2 files changed, 41 insertions(+), 14 deletions(-)
>
> diff --git
> a/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> b/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> index bbee805..92f4c2a 100644
> ---
> a/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> +++
> b/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> @@ -284,13 +284,33 @@ public final class Complex implements Serializable  {
>  return this;
>  }
>
> - /**
> - * Return the absolute value of this complex number.
> - * This code follows the  href="http://www.iso-9899.info/wiki/The_Standard;>ISO C Standard, Annex
> G,
> - * in calculating the returned value (i.e. the hypot(x,y) method)
> - * and in handling of NaNs.
> +/**
> + * Return the absolute value of this complex number. This is also
> called norm, modulus,
> + * or magnitude.
> + * abs(a + b i) = sqrt(a^2 + b^2)
> + *
> + * If either component is infinite then the result is positive
> infinity. If either
> + * component is NaN and this is not {@link #isInfinite() infinite} then
> the result is NaN.
> + *
> + * This code follows the
> + * http://www.iso-9899.info/wiki/The_Standard;>ISO C
> Standard, Annex G,
> + * in calculating the returned value using the {@code hypot(x,y)}
> method.
>   *
>   * @return the absolute value.
> + * @see #isInfinite()
> + * @see #isNaN()
> + * @see Math#hypot(double, double)
> + */
> +public double getAbsolute() {
> +return getAbsolute(real, imaginary);
> +}
> +

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [numbers] Complex

2019-12-06 Thread Gilles Sadowski

Hi.

Le ven. 6 déc. 2019 à 16:47, Alex Herbert  a écrit :
>
> I think this method is redundant:
>
> public Complex multiply(final int factor) {
>  return new Complex(real * factor, imaginary * factor);
> }
>
> given that there is:
>
> public Complex multiply(double factor) {
>  return new Complex(real * factor, imaginary * factor);
> }
>
> This would be equivalent after the implicit conversion of the int to a
> double for the multiplication. Am I missing something here?

I don't think so (or I'm missing it too).

>
> There is no equivalent for divide(double) (i.e. divide(int)).
>
> I propose to remove this method.

+1

>
>
> I would like to reorder the internals of Complex to be like the order of
> the C99 reference.

+1

Gilles

> At the moment the methods are a bit jumbled.
>
> I think something like this:
>
> Constructors
>
> Properties (real(), imaginary())
>
> Properties which are methods (abs(), arg())
>
> Standard object stuff:
> toString(), parse(), equals(), hashCode()
> (Perhaps parse should be with the factory constructor methods?)
>
> C99 order for Complex math:
>
> Arithmetic (add, subtract, multiply, divide)
>
> Trigonometric functions
>
> Hyperbolic functions
>
> Exponential and logarithmic functions
>
> Power and absolute-value function
>
> Other functions...
>
>
> Alex
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [commons-numbers] 04/04: Preserve even function in cosh

2019-12-04 Thread Gilles Sadowski

Test failure here:
---CUT---
ERROR] Failures:
[ERROR]   CStandardTest.testCosh:692->assertComplex:275 Operation
failed (z=(-Infinity,0.5)). Expected: (-Infinity,-Infinity) but was:
(Infinity,-Infinity)
---CUT---

Gilles

2019-12-05 1:45 UTC+01:00, aherb...@apache.org :
> This is an automated email from the ASF dual-hosted git repository.
>
> aherbert pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/commons-numbers.git
>
> commit 8963bd191ca80af6b1ba3f94998d5be0d64e43ac
> Author: Alex Herbert 
> AuthorDate: Thu Dec 5 00:44:27 2019 +
>
> Preserve even function in cosh
> ---
>  .../java/org/apache/commons/numbers/complex/Complex.java   | 14
> +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git
> a/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> b/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> index 490380b..e5b9ca8 100644
> ---
> a/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> +++
> b/commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
> @@ -1241,6 +1241,12 @@ public final class Complex implements Serializable
> {
>  return constructor.create(Math.cosh(real) *
> Math.cos(imaginary),
>Math.sinh(real) *
> Math.sin(imaginary));
>  }
> +// ISO C99: Preserve the even function
> +// f(z) = f(-z)
> +if (negative(real)) {
> +real = -real;
> +imaginary = -imaginary;
> +}
>  // Special case for real == 0
>  final double im = real == 0 ? Math.copySign(0, imaginary) :
> Double.NaN;
>  return constructor.create(Double.NaN, im);
> @@ -1253,6 +1259,12 @@ public final class Complex implements Serializable
> {
>  return constructor.create(Double.POSITIVE_INFINITY,
> im);
>  }
>  // inf * cis(y)
> +// ISO C99: Preserve the even function
> +// f(z) = f(-z)
> +if (real < 0) {
> +real = -real;
> +imaginary = -imaginary;
> +}
>  final double re = real * Math.cos(imaginary);
>  final double im = real * Math.sin(imaginary);
>  return constructor.create(re, im);
> @@ -1262,7 +1274,7 @@ public final class Complex implements Serializable  {
>  }
>  // real is NaN
>  if (imaginary == 0) {
> -return constructor.create(Double.NaN, Math.copySign(0,
> imaginary));
> +return constructor.create(Double.NaN, imaginary);
>  }
>  // optionally raises the ‘‘invalid’’ floating-point exception, for
> nonzero y.
>  return NAN;
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Geometry] Coveralls stuck

2019-12-03 Thread Gilles Sadowski

Le mar. 3 déc. 2019 à 19:21, Alex Herbert  a écrit :
>
>
> On 03/12/2019 16:43, Alex Herbert wrote:
> >
> > ...
> >
> > I am going to try option 1.
>
> Seems to be fixed now.

Thanks!

Gilles

>
> Alex
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] New component for (standard) distributions?

2019-12-03 Thread Gilles Sadowski

Hello.

Le mar. 3 déc. 2019 à 18:41, Eric Barnhill  a écrit :
>
> Sorry I misspoke. Anyway I quite agree, it would work well as
> commons-distribution. +1

No problem. :)

If you have time, some "little" things would be:
* review the distribution classes and ensure that the conventions are uniform
(we had some issues raised in relation to the naming of the parameters): i.e.
all implementations should ideally comply with the articles on Wikipedia and/or
MathWorld,
* increase code coverage (through unit tests).[1]

Best,
Gilles

[1] 
https://sonarcloud.io/component_measures?id=commons-statistics=coverage=list

>
> On Tue, Dec 3, 2019 at 9:29 AM Gilles Sadowski  wrote:
>
> > Hi.
> >
> > Le mar. 3 déc. 2019 à 18:23, Eric Barnhill  a
> > écrit :
> > >
> > > I agree, distributions seems stable and well supported.
> > >
> > > You are proposing releasing it outside of numbers?
> >
> > Code is currently in module "commons-statistics-distribution", within
> > the [Statistics] component (actually: the sole non-empty module!).
> > The proposal is to have a standalone "Commons Distribution" component
> > (that will depend on "Commons Numbers" and "Commons RNG").
> >
> > Gilles
> >
> > >
> > > I think it's a good idea. +1
> > >
> > > On Tue, Dec 3, 2019 at 8:00 AM Gilles Sadowski 
> > wrote:
> > >
> > > > Hello.
> > > >
> > > > Most functionality of the "o.a.c.math4.distribution" package was
> > migrated
> > > > from Commons Math almost 2 years ago.
> > > > The [Statistics] component should also host a refactoring of CM's
> > "stat"
> > > > package but development has stalled.  Obviously, it is unlikely that
> > we can
> > > > perform this task in the short term, while the design of the
> > "distribution"
> > > > module looks fairly stable (and it had already been refactored within
> > CM).
> > > > IMO, it should belong to its own maven project so it can be released
> > > > without
> > > > being encumbered for months (or years?) by the instability of the rest
> > of
> > > > the port...
> > > >
> > > > WDYT?
> > > >
> > > > Gilles
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > >
> > > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Geometry] Towards a release?

2019-12-03 Thread Gilles Sadowski

Hi Matt.

Le mar. 3 déc. 2019 à 18:37, Matt Juntunen  a écrit :
>
> I just added a number of new issues to JIRA. The ones that I would consider 
> required for a 1.0 release are:
>
> -GEOMETRY-67 - OutlineExtractor
> -GEOMETRY-68 - Raycast/Linecast API
> -GEOMETRY-69 - BSPTreeVisitor stop visit
> -GEOMETRY-72 - Boundary API
> -GEOMETRY-77 - BoundsXD classes
> -GEOMETRY-73 - Create User Guide

Nice list!
But which tasks would be indispensable for a 0.1 (beta) release?
Indeed, I wonder whether we should try and attract some attention (through
an official release) so that we can get all the possible collective
help in order
to be confident that the API is universally (!) useful.

Best regards,
Gilles

>
> -Matt
> ____
> From: Gilles Sadowski 
> Sent: Tuesday, December 3, 2019 11:08 AM
> To: Commons Developers List 
> Subject: [Geometry] Towards a release?
>
> Hello.
>
> Are there blocking issues?
> Would i be useful to release a "beta" version?
>
> Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Numbers] Towards a release?

2019-12-03 Thread Gilles Sadowski

Hello.

Le mar. 3 déc. 2019 à 18:33, Eric Barnhill  a écrit :
>
> It seems like we're pretty close.
>
> I can take a look at 136, 137 related to log. I have had trouble finding
> the space to launch the regression project. But I could work on some
> smaller things.

Great. ;-)

>
> Regarding 70, the user guide, what do you think of submitting an
> application to Google Season of Docs?

It would be nice, but probably low(er) priority; in effect, having split off
several components out of Commons Math has the nice (IMO) side-effect
that any of them is easier to grasp, and figuring out what it does and how
to use it is in general relatively straightforward.
Moreover, most developers looking for such tools don't have to be told what
a complex number is, and (maven) modules make it easy to navigate the
code base.

> I can initiate if you have had quite
> enough of that sort of thing.

Indeed, delegating documentation tasks was often more work than it would
have been doing it! :-}

However, a user guide for "Commons Geometry" is on the TODO list.[1]
You should then coordinate with Matt.

Thanks,
Gilles

[1] https://issues.apache.org/jira/browse/GEOMETRY-73

>
> Eric
>
>
> On Tue, Dec 3, 2019 at 7:41 AM Gilles Sadowski  wrote:
>
> > Hello.
> >
> > What do you think of releasing "Commons Numbers"?
> > Please have a look at the list of pending issues.[1]
> >
> > Gilles
> >
> > [1]
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20NUMBERS%20AND%20fixVersion%20%3D%201.0%20AND%20statusCategory%20%3D%20new

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] New component for (standard) distributions?

2019-12-03 Thread Gilles Sadowski

Hi.

Le mar. 3 déc. 2019 à 18:23, Eric Barnhill  a écrit :
>
> I agree, distributions seems stable and well supported.
>
> You are proposing releasing it outside of numbers?

Code is currently in module "commons-statistics-distribution", within
the [Statistics] component (actually: the sole non-empty module!).
The proposal is to have a standalone "Commons Distribution" component
(that will depend on "Commons Numbers" and "Commons RNG").

Gilles

>
> I think it's a good idea. +1
>
> On Tue, Dec 3, 2019 at 8:00 AM Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Most functionality of the "o.a.c.math4.distribution" package was migrated
> > from Commons Math almost 2 years ago.
> > The [Statistics] component should also host a refactoring of CM's "stat"
> > package but development has stalled.  Obviously, it is unlikely that we can
> > perform this task in the short term, while the design of the "distribution"
> > module looks fairly stable (and it had already been refactored within CM).
> > IMO, it should belong to its own maven project so it can be released
> > without
> > being encumbered for months (or years?) by the instability of the rest of
> > the port...
> >
> > WDYT?
> >
> > Gilles
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Geometry] Exceptions hierarchy

2019-12-03 Thread Gilles Sadowski

Hello.

2019-12-03 17:13 UTC+01:00, Matt Juntunen :
> Hello,
>
> I don't feel like ArithmeticException quite captures all of the possible
> geometry exception states. For example, it seems odd to me to throw an
> ArithmeticException when a plane cannot be constructed due to a given set of
> points being collinear [1].

In that method, argument "pts" (the list of points) is for all
purpose, a caller's mistake (requesting a plane from collinear
points), hence the result of the call should actually be that a
"IllegalArgumentException" is thrown.

I'd suggest that the "fromPoints" be refactored into
---CUT---
public static boolean isCollinear(Collection pts,
DoublePrecisionContext prec)
---CUT---
and
---CUT---
public static Plane fromPoints(Collection pts,
DoublePrecisionContext prec) {
// ...

if (isCollinear(pts, prec)) {
throw new IllegalArgumentException("Collinear points");
}

// ...
}
---CUT---

> That idea feels beyond the concept of
> "arithmetic".

IMHO, we should aim for the leanest API.  In the above case, nothing
is added by throwing a custom type.

> However, I do think that the GeometryValueException subclass
> is not useful and should be removed.

+1

> On a more general note, the rule I've been following recently is to throw
> JDK exceptions like IllegalArgumentException for programming-level errors
> (eg, passing an array of the wrong length to a method) and GeometryException
> or an appropriate subclass for errors related to geometric operations.

I understand the willingness to make a distinction but what purpose
does it serve from a user's POV?  And from our POV, all these "little"
things will get in the way of making changes that are backwards BC.

Gilles

>
> -Matt
>
>
> [1]
> https://github.com/apache/commons-geometry/blob/master/commons-geometry-euclidean/src/main/java/org/apache/commons/geometry/euclidean/threed/Plane.java#L733
>
> 
> From: Gilles Sadowski 
> Sent: Thursday, November 28, 2019 8:54 PM
> To: Commons Developers List 
> Subject: [Geometry] Exceptions hierarchy
>
> Hello.
>
> Is there any value added by "GeometryException" over the standard
> "ArithmeticException"?
> If not, I'd rather have the public API advertize the latter.
>
> We could have an *internal* utility class that instantiates exceptions:
> ---CUT---
> public class ExceptionFactory {
>  public ArithmeticException illegalNorm(double norm) {
>   return new ArithmeticException("Illegal norm: " + norm);
>  }
> }
> ---CUT---
>
> Regards,
> Gilles
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Geometry] Towards a release?

2019-12-03 Thread Gilles Sadowski

Hello.

Are there blocking issues?
Would i be useful to release a "beta" version?

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Statistics] New component for (standard) distributions?

2019-12-03 Thread Gilles Sadowski

Hello.

Most functionality of the "o.a.c.math4.distribution" package was migrated
from Commons Math almost 2 years ago.
The [Statistics] component should also host a refactoring of CM's "stat"
package but development has stalled.  Obviously, it is unlikely that we can
perform this task in the short term, while the design of the "distribution"
module looks fairly stable (and it had already been refactored within CM).
IMO, it should belong to its own maven project so it can be released without
being encumbered for months (or years?) by the instability of the rest of
the port...

WDYT?

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Numbers] Towards a release?

2019-12-03 Thread Gilles Sadowski

Hello.

What do you think of releasing "Commons Numbers"?
Please have a look at the list of pending issues.[1]

Gilles

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20NUMBERS%20AND%20fixVersion%20%3D%201.0%20AND%20statusCategory%20%3D%20new

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Geometry] Coveralls stuck

2019-12-03 Thread Gilles Sadowski

Hello.

Any idea why Coveralls does not pick up the latest build for "Commons Geometry"
https://coveralls.io/github/apache/commons-geometry?branch=master
?
Coverage is quite good in fact:
https://sonarcloud.io/dashboard?id=commons-geometry

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[Statistics] Purpose of "ConstantContinuousDistribution"?

2019-11-30 Thread Gilles Sadowski

Hello.

Class "ConstantContinuousDistribution"[1] was ported (and renamed)
from CM's "ConstantRealDistribution".[2]
It has some strange (?) features, and no reference is mentioned in the
Javadoc.
It turns out that its sole usage is in "EmpiricalDistribution", and that
latter class is still in CM.  Perhaps this special-purpose implementation
was needed there as a workaround... Hence I propose to move it back
to CM.  Moreover I would make it a private nested class of class
"EmpiricalDistribution"

Any objection?

Regards,
Gilles

[1] 
https://gitbox.apache.org/repos/asf?p=commons-statistics.git;a=blob;f=commons-statistics-distribution/src/main/java/org/apache/commons/statistics/distribution/ConstantContinuousDistribution.java;h=85c480dc3e76bc540460da13405bb2409156d1f4;hb=HEAD
[2] 
http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/ConstantRealDistribution.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 4252 matches

Mail list logo