[Math] Splitting "legacy" codes into maven modules

2021-05-27 Thread Gilles Sadowski
Hello.

As mentioned in another thread[1] it will be worth collecting several
maven modules under module "commons-math-legacy" (currently
in "modularized_master" branch[2]).
It is an intermediate step that focuses on severing spurious cyclic
dependencies.  This refactoring is also an opportunity to hide
functionality that should not be part of the public API (i.e. make the
corresponding classes private or package-private).

This thread is dedicated to listing the pending changes.

Regards,
Gilles

[1] https://markmail.org/message/65654d2qvcayqyfk
[2] To be merged into "master" sometime, soon...

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math] FastMath missing methods from java.lang.Math...

2021-05-27 Thread Gilles Sadowski
Hi.

Le jeu. 27 mai 2021 à 11:32, Erik Svensson  a écrit :
>
> Howdy all!
>
>
>
> I’m comparing FastMath with java.lang.Math and I notice that FastMath is no 
> longer Math-complete.
>
> Ie, jlM has methods that FastMath doesn’t have.

Incidentally, that's why the unit test suite fails when building with
JDK 9+ (when
"JAVA_HOME" is not set properly).

>
> FM is documented as ‘drop-in replacement’ so should the new jlM methods be 
> added to FM?

Do you propose to write/port pure Java implementations for those?

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [RNG] Multiple samplers, shared "UniformRandomProvider"

2021-05-25 Thread Gilles Sadowski
Le mer. 26 mai 2021 à 00:06, Alex Herbert  a écrit :
>
> On Tue, 25 May 2021 at 20:15, Gilles Sadowski  wrote:
>
> > Hi.
> >
> > I wonder if/how we would introduce the following functionality:
> > ---CUT---
> > /**
> >  * @param rng Generator that will be shared in the returned sampler.
> >  * @param list Samplers whose underlying generators will be discarded
> > in
> >  * the returned instance.
> >  * @return a sampler sharing the given provider.
> >  */
> > public static ObjectSampler
> > withUniformRandomProvider(final UniformRandomProvider rng, final
> > SharedStateContinuousSampler... list) {
> > final SharedStateContinuousSampler[] samplers = new
> > SharedStateContinuousSampler[list.length];
> > for (int i = 0; i < list.length; i++) {
> > samplers[i] = list[i].withUniformRandomProvider(rng);
> > }
> >
> > return new ObjectSampler() {
> > /** {@inheritDoc} */
> > @Override
> > public double[] sample() {
> > final double[] out = new double[list.length];
> > for (int i = 0; i < list.length; i++) {
> > out[i] = samplers[i].sample();
> > }
> > return out;
> > }
> > };
> > }
> > ---CUT---
> >
>
> Note it can return SharedStateObjectSampler. The implementation
> is to create a new instance using the same method:
>
> static SharedStateObjectSampler of(UniformRandomProvider rng,
> SharedStateContinuousSampler... list) {
> final SharedStateContinuousSampler[] samplers = new
> SharedStateContinuousSampler[list.length];
> for (int i = 0; i < list.length; i++) {
> samplers[i] = list[i].withUniformRandomProvider(rng);
> }
>
> return new SharedStateObjectSampler() {
> @Override
> public double[] sample() {
> final double[] out = new double[samplers.length];
> for (int i = 0; i < samplers.length; i++) {
> out[i] = samplers[i].sample();
> }
> return out;
> }
>
> @Override
> public SharedStateObjectSampler
> withUniformRandomProvider(UniformRandomProvider rng) {
> return of(rng, samplers);
> }
> };
> }
>
> In this case though the anonymous inner class retains a reference to the
> enclosing class. So chaining the withUniformRandomProvider calls on
> returned objects would have a memory overhead. It would be cleaner to
> return a static class with the same functionality.
>
> Out of interest, do you have a use case?

Purpose is to supersede CM's "UncorrelatedRandomVectorGenerator".

> Possible name:
> CompoundSampler
>
> This is a variation on the CompositeSampler I suggested in another thread.
> A composite sampler (name TBD) is composed of 2 or more samplers. A sampler
> is chosen using a weighted distribution and a single sample returned from 1
> sampler. Here a compound sampler (name TBD) is composed of 2 or more
> samplers. The return sample is a single sample from each sampler in the
> compound.
>
> I will open a ticket and PR with the static CompositeSamplers factory class
> I created. The idea of a compound sampler could be added to the same class.

+1

>
> Alex

[1] 
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=commons-math-legacy/src/main/java/org/apache/commons/math4/legacy/random/UncorrelatedRandomVectorGenerator.java;h=9df5fef1a39fc535ab66874169f329a9adbef36b;hb=refs/heads/modularized_master

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[RNG] Multiple samplers, shared "UniformRandomProvider"

2021-05-25 Thread Gilles Sadowski
Hi.

I wonder if/how we would introduce the following functionality:
---CUT---
/**
 * @param rng Generator that will be shared in the returned sampler.
 * @param list Samplers whose underlying generators will be discarded in
 * the returned instance.
 * @return a sampler sharing the given provider.
 */
public static ObjectSampler
withUniformRandomProvider(final UniformRandomProvider rng, final
SharedStateContinuousSampler... list) {
final SharedStateContinuousSampler[] samplers = new
SharedStateContinuousSampler[list.length];
for (int i = 0; i < list.length; i++) {
samplers[i] = list[i].withUniformRandomProvider(rng);
}

return new ObjectSampler() {
/** {@inheritDoc} */
@Override
public double[] sample() {
final double[] out = new double[list.length];
for (int i = 0; i < list.length; i++) {
out[i] = samplers[i].sample();
}
return out;
}
};
}
---CUT---

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math][Numbers][Geometry][Statistics] Road map for next release(s)

2021-05-24 Thread Gilles Sadowski
Le dim. 23 mai 2021 à 22:54, Alex Herbert  a écrit :
>
> On Sun, 23 May 2021 at 15:58, Gilles Sadowski  wrote:
>
> >
> > I've created a multi-module[1] version of the code base with a
> > corresponding JIRA issue:
> > https://issues.apache.org/jira/browse/MATH-1575
>
>
> Thanks. This is more maintainable going forward.
>
> [...]
>
> Do you propose to release v4 to get a release out with all recent bug fixes
> and then work on v5 to resolve major design issues? Or can the
> design issues be isolated to packages and thus v4 would not include
> packages that require a major redesign? [...]

As an intermediate step (which Samy Badjoudj seems to have undertaken),
the "legacy" module itself could be split into several modules, e.g.:
  * commons-math-legacy-exception
  * commons-math-legacy-linear
  * ...
Those would still be "legacy" because the split would primarily deal with
removing spurious dependencies, but still contain code that IMO should be
abandoned (like the whole exception "infrastructure") in non-"legacy" code.
More involved design issues (probably requiring more in-depth knowledge
of the algorithms and their various use-cases) and major tasks (refactoring
the "regression" package, for example) will be left for later.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Math][Numbers][Geometry][Statistics] Road map for next release(s)

2021-05-23 Thread Gilles Sadowski
Le dim. 23 mai 2021 à 22:54, Alex Herbert  a écrit :
>
> On Sun, 23 May 2021 at 15:58, Gilles Sadowski  wrote:
>
> >
> > I've created a multi-module[1] version of the code base with a
> > corresponding JIRA issue:
> > https://issues.apache.org/jira/browse/MATH-1575
>
>
> Thanks. This is more maintainable going forward.
>
>
> > The upcoming version of CM would depend on (non-beta) releases of
> >   * Commons Numbers
> >   * Commons Geometry
> >   * Commons Statistics
> >
> > Any objection to have those released, and then CM v4.0, ASAP?
> >
> [...]
>
>
> Do you propose to release v4 to get a release out with all recent bug fixes

Yes.

> and then work on v5 to resolve major design issues?

This could be done incrementally in 4.x releases if new modules are added
and the corresponding legacy code marked as deprecated but not removed.

> Or can the
> design issues be isolated to packages and thus v4 would not include
> packages that require a major redesign?

It would not be easy to sort out what must be released because some
bug was fixed from what should not be because some bug wasn't
fixed yet...

> For example I recall there are
> issues with math3.stat.descriptive.moment but the attempt to move these to
> Statistics in GSOC 2019 was not completed. So for example could the entire
> stat package not be released in v4 and new code would be targeted to
> Statistics?

There are design issues (big or small) everywhere; if some "legacy" codes
are not released, it would entail that we support 2 versions of the
library (i.e.
the old v3.6.1 for all the bits not released in v4.0).

The overall problem with CM is that we effectively do not support the whole
code base (e.g. several reported bugs linger due to lack of expertise in the
concerned area).
Of course, it was the main reason for developing and releasing more focused
components that were within the area of interest of the currently active
developers.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[Math][Numbers][Geometry][Statistics] Road map for next release(s)

2021-05-23 Thread Gilles Sadowski
Hi.

Following recent discussions (with too few participants), no
consensus emerged about the best way to support the [Math]
component.

I've created a multi-module[1] version of the code base with a
corresponding JIRA issue:
https://issues.apache.org/jira/browse/MATH-1575

The new layout of the [Math] maven project is in a "git" branch
named "modularized_master":

https://gitbox.apache.org/repos/asf?p=commons-math.git;a=shortlog;h=refs/heads/modularized_master

It already features several modules:
  * commons-math-transform
  * commons-math-neuralnet
  * commons-math-legacy

There is also
  * commons-math-examples
with "sub-modules" each with an executable application. [See also
MATH-1580]

Branch "modularized_master" is available for review.
[Help needed for the "CheckStyle" issue (MATH-1576).]

Module "commons-math-legacy" contains the codes that haven't
yet been refactored into specific functionalities in order to make it
into a dedicated module.

Functionalities that were discussed relatively recently (candidate
for modularization):
  * Genetic algorithm (in "o.a.c.math4.legacy.genetics")
  * Clustering (in "o.a.c.math4.legacy.ml.clustering")
  * Regression (in "o.a.c.math4.stat.regression")
  * Alternative to JDK "Math" class (in "o.a.c.math4.util.FastMath")
  * ...

Are people (Avijit Basak, Erik Svensson, Samy Badjoudj, ...) who
expressed interest in these areas of CM still willing to contribute?
[Please start new threads for discussing the specifics of each
candidate module.]
Module "neuralnet" can serve as a template and illustrates the
refactoring aimed at a library JAR depending Java 8 and on truly
low-level Commons components, such as [RNG] or [Numbers],
and *not* depending on the "legacy" module.

The upcoming version of CM would depend on (non-beta) releases of
  * Commons Numbers
  * Commons Geometry
  * Commons Statistics

Any objection to have those released, and then CM v4.0, ASAP?

Regards,
Gilles

[1] This will unfortunately not fix the (design and maintenance) issues
exposed along the years.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] working on increasing performance for Math3

2021-05-21 Thread Gilles Sadowski
Ping.

I've created a "neuralnet" module (for the ANN/SOFM functionality), in the
"modularized_master" branch, that can be used as a template for the feature
discussed in this thread.  Are you still interested in implementing it?
If so, I'd suggest
  commons-math-jdkmath
for the name.
It should be a "core" module (independend of the "legacy" module).

Regards,
Gilles

Le sam. 15 mai 2021 à 18:38, Gilles Sadowski  a écrit :
>
> Le mer. 12 mai 2021 à 13:23, Erik Svensson  a écrit 
> :
> >
> > Howdy all,
> >
> > Irrespective of how it's implemented we would like to implement the 
> > performance improvements possible to the commons-math lib.
> > While I can certainly fork the project and we could have our own version of 
> > commons-math, we would like to contribute.
>
> Please use the newly-created branch.[1]
> [Thus: You should try and take all the necessary the code/package
> out of the "legacy" module and create new module that would provide
> the functionality which you want to achieve.]
>
> Please file a JIRA report with the proposal (e.g. as a "sub-task" of the
> MATH-1575 report.[2]
>
> Thanks,
> Gilles
>
> [1] 
> https://gitbox.apache.org/repos/asf?p=commons-math.git;a=shortlog;h=refs/heads/modularized_master
> [2] https://issues.apache.org/jira/browse/MATH-1575
>
> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] working on increasing performance for Math3

2021-05-15 Thread Gilles Sadowski
Le mer. 12 mai 2021 à 13:23, Erik Svensson  a écrit :
>
> Howdy all,
>
> Irrespective of how it's implemented we would like to implement the 
> performance improvements possible to the commons-math lib.
> While I can certainly fork the project and we could have our own version of 
> commons-math, we would like to contribute.

Please use the newly-created branch.[1]
[Thus: You should try and take all the necessary the code/package
out of the "legacy" module and create new module that would provide
the functionality which you want to achieve.]

Please file a JIRA report with the proposal (e.g. as a "sub-task" of the
MATH-1575 report.[2]

Thanks,
Gilles

[1] 
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=shortlog;h=refs/heads/modularized_master
[2] https://issues.apache.org/jira/browse/MATH-1575

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] Add ObjectSampler interfaces and a CompositeSampler

2021-05-15 Thread Gilles Sadowski
Hi Alex.

Would the proposal be any different if with Java 8+ features?
[IOW, is it still useful (in any sense) to stick with Java 6?]

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [text][geometry] DoubleFormats utility

2021-05-14 Thread Gilles Sadowski
Hi.

Le ven. 14 mai 2021 à 04:17, Matt Juntunen  a écrit :
>
> Hello,
>
> Yes, the JDK definitely has number formatting capabilities. The class that I 
> propose moving to text was designed specifically for data IO operations, 
> where large numbers of doubles need to be serialized to strings in some 
> standard, non-localized format. I was unable to find exactly what I wanted 
> for this in the JDK, so I wrote my own class. The main advantages to this 
> code are that the produced formatters are 1) completely thread-safe,

Out of curiosity, what are use-cases for this feature?

> 2) easy to use, 3) offer a range of formats, and 4) are at least as 
> performant as BigDecimal and DecimalFormat.

As a concrete discussion point, it might be interesting to post benchmarks
comparisons.

Regards,
Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] working on increasing performance for Math3

2021-05-12 Thread Gilles Sadowski
Hi.

Le mer. 12 mai 2021 à 13:23, Erik Svensson  a écrit :
>
> Howdy all,
>
> Irrespective of how it's implemented we would like to implement the 
> performance improvements possible to the commons-math lib.
> While I can certainly fork the project and we could have our own version of 
> commons-math, we would like to contribute.

Thanks.

> If there is no or little interest in a more performant commons-math (one that 
> uses jlM where appropriate), then I will fork and maybe submit patches.
> If there is interest, then I would like to have suggestions on the how and 
> what.

I mentioned some, although I do not know which are realistic or simply useful.

> How would you like it to work, what would you like see.
> Should you be able to choose between jlM or FastMath, for instance.

At first sight, I'd think so.

>
> Ps
> I asked about bytebuddy because of point 5 on the 
> https://commons.apache.org/proper/commons-math/

As of the current development version of Commons Math[1], the JDK
dependency has been upgraded to at least Java 8.

The features (whatever they may be) could also be available, or not,
depending on the JVM being used.

Regards,
Gilles

[1] 
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;h=refs/heads/master;hb=refs/heads/master

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] working on increasing performance for Math3

2021-05-08 Thread Gilles Sadowski
>> [...]
>
> It might make sense to update the FastMath Javadoc to clarify that the
> main focus of the class is now accuracy (and portability?) rather than
> speed.

Done.

> This should help manage user expectations.

Yet the idea that though some API (TBD), an application could select
which implementation to use seems useful.
The "legacy" class (still to be called "FastMath"?) would be untouched
(and the reference for any function where the priority is accuracy over
speed).

> > > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] working on increasing performance for Math3

2021-05-08 Thread Gilles Sadowski
Hello.

Le sam. 8 mai 2021 à 08:42, Benjamin Marwell  a écrit :
>
> Instead of using byte buddy, why not just maven multi release jars? *1

Thanks for the suggestion.
Again, I don't know how to do it concretely[1] nor the implications
(IIRC, the mention of multi-release JARs was frowned upon some
time ago).

> The Java9+ impl will go to META-INF/java9 or so. We did that in the
> maven-jlink-plugin for example. *2
> Will be much faster and work on Java 16+. Much easier to rest.

All good points, I guess.

However, what about the potential of having a user-selectable
version of the implementations of the functions contained in
the JDK's "Math" class?

Unless one can provide a reference that, from Java 9 on, the
accuracy cannot be improved upon what the JDK computes,
the point remains that "FastMath" is more accurate, so that we
cannot assume that a transparent call to "java.util.Math" won't
have side-effects.

> Drawback: bad IDE support

Personally I don't care. ;-)

Regards,
Gilles

[1] If you'd like to propose a patch/PR, I could set up a dedicated branch.

>
> 1: https://maven.apache.org/plugins/maven-compiler-plugin/multirelease.html
>
> 2:
> https://github.com/apache/maven-jlink-plugin/commit/f8bdf5050c266854524aaa51eb36109c00ca692a
>
> HTH
> Ben
>
>
> On Fri, 7 May 2021, 10:52 Erik Svensson,  wrote:
>
> > Howdy all!
> >
> >
> >
> > I’m looking to do some work on FastMath to increase performance by using
> > java.lang.Math where it is applicable (ie where there is an
> > @HotspotIntrinsicCandidate annotation on the method).
> >
> > Since HIC was introduced in java 9 and the code needs to work on pre-java
> > 9 and I don’t want to compromise performance (since performance is the
> > whole reason I’m doing this) I’m thinking about using ByteBuddy to
> > construct a proxy class but I’m unsure whether that is allowed in Apache
> > Commons.
> >
> > Btw, I’ve tested using MethodHandles but that consumed almost all the
> > performance improvement java.lang.Math had over FastMath (for sin anyway).
> >
> >
> >
> > Cheers
> >
> >
> >
> > *Erik Svensson*
> >
> > Principal Architect
> > Strategic Programs, Platform & Product Engineering
> >
> > [image: Nasdaq, Inc.] <http://www.nasdaq.com/>
> >
> > *Desk*
> > *Mobile*
> > *Email*
> > *Address  *
> >
> > + 46 8 405 66 39
> > + 46 73 449 66 39
> > erik.svens...@nasdaq.com
> > Tullvaktsvägen 15, Stockholm
> >
> > [image: signature_170089658] <https://www.facebook.com/nasdaq/>
> >
> > [image: signature_1919909931] <https://twitter.com/nasdaq>
> >
> > [image: signature_1754902186] <https://www.linkedin.com/company/nasdaq>
> >
> > [image: signature_1778387217] <https://www.instagram.com/nasdaq/>
> >
> > [image: signature_1326178797] <https://www.pinterest.com/nasdaq/>
> >
> > rewritetomorrow.com
> >
> >
> >
> >  ***
> > CONFIDENTIALITY AND PRIVACY NOTICE: This e-mail and any attachments are
> > for the exclusive and confidential use of the intended recipient and may
> > constitute non-public information.  Personal data in this email is governed
> > by our Privacy Policy at  https://www.nasdaq.com/privacy-statement
> > unless explicitly excluded from it; please see the section in the policy
> > entitled “Situations Where This Privacy Policy Does Not Apply” for
> > circumstances where different privacy terms govern emailed personal data.
> > If you received this e-mail in error, disclosing, copying, distributing or
> > taking any action in reliance of this e-mail is strictly prohibited and may
> > be unlawful. Instead, please notify us immediately by return e-mail and
> > promptly delete this message and its attachments from your computer system.
> > We do not waive any work product or other applicable legal privilege(s) by
> > the transmission of this message.
> > ***
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] working on increasing performance for Math3

2021-05-07 Thread Gilles Sadowski
Hello.

Le ven. 7 mai 2021 à 10:52, Erik Svensson  a écrit :
>
> Howdy all!
>
>
>
> I’m looking to do some work on FastMath

Thanks for your interest.

> to increase performance by using java.lang.Math where it is applicable (ie 
> where there is an @HotspotIntrinsicCandidate annotation on the method).

Please be sure have a look at the open issues that mention
"FastMath" on the bug tracking system.[1]

>
> Since HIC was introduced in java 9 and the code needs to work on pre-java 9 
> and I don’t want to compromise performance (since performance is the whole 
> reason I’m doing this)

One important point is that "FastMath" was probably not a
good name, as the most consistent feature is accuracy,
better than JDK's "Math" at the time.  If that's still the case
we may not want to loose that.

> I’m thinking about using ByteBuddy to construct a proxy class

I don't know anything about it, but it seems a way to implement
the feature that will let the caller decide what is more important
(precision or performance).  Correct?

> but I’m unsure whether that is allowed in Apache Commons.

It's under ASL 2.0.
So the only question would be whether a dependency towards
a library outside Commons is acceptable.
Could this be implemented around an _optional_ dependency?

>
> Btw, I’ve tested using MethodHandles but that consumed almost all the 
> performance improvement java.lang.Math had over FastMath (for sin anyway).
>

Did you use JMH for the benchmark?[2]
The first step is perhaps to open a new JIRA issue and post the
result over there.
We want to modularize Commons Math, so setting up a module
for JMH testing is welcome.

Regards,
Gilles

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20MATH%20AND%20status%20%3D%20Open%20AND%20description%20%20~%20%22FastMath%22
[2] At the time, we implemented (2) custom benchmarks (see the test sources).

>
> Cheers
>
>
>
> Erik Svensson
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-06 Thread Gilles Sadowski
Le jeu. 6 mai 2021 à 20:29, Oliver Heger
 a écrit :
>
>
>
> Am 05.05.21 um 21:54 schrieb Gilles Sadowski:
> > Le mer. 5 mai 2021 à 20:33, Oliver Heger
> >  a écrit :
> >>
> >>
> >>
> >> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> >>> Le mer. 5 mai 2021 à 18:57, Gary Gregory  a écrit 
> >>> :
> >>>>
> >>>> IMO the lack of +1s shows the lack of appetite to manage another 
> >>>> component
> >>>
> >>> That's certainly true.
> >>> And nobody is forced to do anything.
> >>>
> >>> When the other CM spin-offs started, there was only _one_ person
> >>> willing to do the work.
> >>
> >> What about the sandbox? IIUC, every committer can start a new component
> >> there. If then a community forms around this component, it can move to
> >> proper (which would then require a vote).
> >>
> >> Would this be an option to get started?
> >
> > [Graph] is listed in the sandbox[1], yet when someone expressed a 
> > willingness
> > to contribute, we had a "git" repository created[2] (even though the
> > web site has
> > remained outdated[3], probably because the attempt was short-lived).
> >
> > So indeed, I could have already created the repository a few weeks ago...
> >
> > However in this instance, what would it mean to have codes that have lived
> > within a "proper" component for 6 years and more be moved to "sandbox"?
>
> A way to move forward?

Thanks for trying to be contructive (and a decent tone).

I've been told that I should learn to count; that the vote (to
create a repository) has failed.
Hence that option has also been ruled out.  [What was OK for
[Graph] in sandbox, somehow is not anymore.  Go figure...]

Gilles

>
> Oliver
>
> >
> > Regards,
> > Gilles
> >
> > [1] http://commons.apache.org/sandbox/commons-graph/
> > [2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
> > [3] http://commons.apache.org/sandbox/commons-graph/source-repository.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-06 Thread Gilles Sadowski
> > [...]
> >>
> >> So a procedural vote requires a majority.
> >
> > There is a small majority (irrespective of the binding vs non-binding
> > categories).
>
> In votes ONLY PMC member votes are counted. Other votes are
> advisory. PMC members should take those votes into account
> when voting.

That's the point indeed: the "advisory" information was not taken
into account.

Last time the PMC turned down a contribution, the conversation
had made it clear that the donating people did not intend to
support it.
Here we have the "opposite" case: Code that is rotting here could
be taken back to life.  Yet it seems that sparing some bits on the
ASF servers is more important than having people feel welcome
to contribute here.

> If you don’t understand that concept you shouldn’t
> be on a PMC.

Sure. There is "concept" for that nowadays: Cancel culture...

> Trying to justify creating a new Commons component by endlessly
> discussing the topic just isn’t going to work.
>
> I’ll not be responding to more emails on this thread

... exactly (see above).

> as I consider the
> matter closed.


Gilles

>
> Ralph

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-06 Thread Gilles Sadowski
Le jeu. 6 mai 2021 à 14:48, Emmanuel Bourg  a écrit :
>
> Le 2021-05-06 13:06, Gilles Sadowski a écrit :
>
> > It is not nice to decide for others what they may need.
>
> It is not nice to suggest I shouldn't voice my opinions.

Your argued opinion is welcome.
In the text which you cut, you *explicitly* said that I should
go somewhere else (GitHub or whatever).

>
> > It would have been courteous to acknowledge the answers to
> > your argument against having a dedicated component
>
> I've little appetite for lengthy debate with you again.

There is/was no debate (as in: "an exchange of arguments" or
"trying to get consensus" or "not forcing me to do what I think is
bad"), you state your opinion (as mentioned above) and that's it.

> > My rationale, for whether a specific component is needed, has
> > always been the same: Define a scope (and stick to it).
> > You seem to find this acceptable for any Commons project except
> > those which you tagged as "math-related".
>
> The machine learning scope is too wide, it doesn't belong here.

I agree that it is wide, but much less so than "math", yet you never
voiced such an opinion against CM (while I did).

> > So I'm asking: Will it make any difference if the "machine learning"
> > codes are further developed within [Math]?  Concretely:
> >  * Would you vote to release CM v4.0?
> >  * Would you help (more than if the ML codes were in a
> >specific component) to review/merge the PRs?
>
> I'd would vote favorably for a modularized CM 4.0 release,

I really (really, really) can't figure out how you can reconcile that a
library (CM) that *contains* a ML subset which you deem too big
to be a Commons component, is not too big to be a Commons
component!

The spin-offs from CM do solve the issue of "too wide scope" that
doomed CM.
And again: I agree that "machine learning" may be too wide a
scope itself; grouping all such algorithms in a single component
was already a compromise wrt to having each ML field in its own,
especially if we aimed at some common goal (multi-threading) that
could lead to shared code (not the math algorithms but, o.a. things,
the threads management).

> but I still
> think that the math related components would be best served in their own
> TLP with a dedicated community

When this was brought up somewhat seriously, most of the
PMC voted against.
Then last time (IIRC) the idea was floated, there wasn't the
minimum of people required to support a TLP.  [FTR, that was
the practical reason these codes are here (as is the for all the
other Commons components): a place where more people can
contribute to otherwise orphaned libraries.]

OK, then let's move on; thus I'm asking who in this PMC, is
now willing to provide the necessary clearance for an internal
fork of the math-related codes for which it is deemed that they
are not a good fit for Commons?

> free of the Apache Commons rules and
> constraints.

I'm still to be shown what rules I'd be asking to be free of.

Gilles

>
> Emmanuel Bourg
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-06 Thread Gilles Sadowski
Le jeu. 6 mai 2021 à 02:24, Emmanuel Bourg  a écrit :
>
> Le 2021-05-05 20:31, Oliver Heger a écrit :
>
> > What about the sandbox? IIUC, every committer can start a new
> > component there. If then a community forms around this component, it
> > can move to proper (which would then require a vote).
>
> With the various source hosting solutions available today we no longer
> need the sandbox, and I think we should discontinue this practice. The
> machine learning library could as well start its life on GitHub, it
> doesn't need Apache Commons.

It is not nice to decide for others what they may need.

It would have been courteous to acknowledge the answers to
your argument against having a dedicated component (to more
efficiently manage codes that have already been accepted within
the "Commons" project, as part of CM), and explain
 * why those answers would not make you withdraw your -1,
 * why the ASF would be better off without the offered contribution,
 * why some initiatives in Commons deserve a worse treatment
   than others.

My rationale, for whether a specific component is needed, has
always been the same: Define a scope (and stick to it).
You seem to find this acceptable for any Commons project except
those which you tagged as "math-related".

So I'm asking: Will it make any difference if the "machine learning"
codes are further developed within [Math]?  Concretely:
 * Would you vote to release CM v4.0?
 * Would you help (more than if the ML codes were in a
   specific component) to review/merge the PRs?

Gilles

>
> Emmanuel Bourg
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-06 Thread Gilles Sadowski
Le jeu. 6 mai 2021 à 07:53, Ralph Goers  a écrit :
>
>
> > On May 5, 2021, at 11:13 AM, Gilles Sadowski  wrote:
> >
> > Le mer. 5 mai 2021 à 17:44, Ralph Goers  a 
> > écrit :
> >>
> >>
> >>
> >>> On May 5, 2021, at 6:38 AM, Gilles Sadowski  wrote:
> >>>
> >>> Le mar. 4 mai 2021 à 02:49, Ralph Goers  a 
> >>> écrit :
> >>>>
> >>>> I apologize. I started another thread regarding the vote before seeing 
> >>>> this.
> >>>
> >>> No problem.
> >>>
> >>>> Maybe that will get more attention?
> >>>
> >>> It doesn't seem so. :-}
> >>>
> >>> IMHO, valid answers have been given to the statements/questions
> >>> from people who didn't vote +1.
> >>> The very low turnout makes the arithmetics of the result fairly 
> >>> subjective...
> >>>
> >>> The optimistic view is that
> >>> 1. most people don't care (that the repository is created),
> >>> 2. there is no reason to doubt the infos provided by actual users of
> >>> those codes,
> >>> 3. there is an embryo of a community (perhaps not viable, but only
> >>> the future can tell...),[1]
> >>> 4. the same kind of welcoming gestures should apply for the proposed
> >>> contributions, as for the attempt to resuscitate "Commons Graph"[2],
> >>> even if some of the PMC might arguably prefer another option.
> >>
> >> Regardless, following https://www.apache.org/foundation/voting.html 
> >> <https://www.apache.org/foundation/voting.html> indicates that this vote 
> >> is not going to pass.
> >
> > How so?
> > [It's not about a code change; and no "technical argument" can be invoked.]
>
> It looks like you didn’t read the page.

I did, of course. And my interpretation differs.

> For clarity I am copying it here
>
> "Votes on procedural issues follow the common format of majority rule unless
>
> otherwise stated. That is, if there are more favourable votes than 
> unfavourable ones,
>
> the issue is considered to have passed -- regardless of the number of votes 
> in each
>
> category. (If the number of votes seems too small to be representative of a 
> community
>
>  consensus, the issue is typically not pursued. However, see the description 
> of
>
> lazy consensus <https://www.apache.org/foundation/voting.html#LazyConsensus> 
> for a modifying factor.)"
>
>
> So a procedural vote requires a majority.

There is a small majority (irrespective of the binding vs non-binding
categories).

> But note that it also calls out that if the number of voters
> seems too small then the issue is usually not pursued.

"usually"...
In Commons, the number of votes has always been low, in
proportion of the official number of committers.
No surprise that, for very specific functionalities, it is even
lower.
However the main point should rather have been whether
the perspective exists that someone will do the work for
getting a chance for a community to ever exist.
In the case of ML algorithms, a discussion started that has
involved 4 people (among them 2 PMC people); this is largely
more than the "usual" attendance about any one specific
component's issue.

>  Both of these describe this situation perfectly.
> The vote did not get a majority of binding votes (it was a tie) and the 
> number of votes was very small.
>
>
> >
> >> You can’t assert lazy consensus on an explicit vote.  If you had started 
> >> this as a lazy consensus vote it
> >> is likely it would have still gotten a -1 vote since both Sebb and 
> >> Emmanuel have voice opposition.
> >
>
>
> > A "veto" does not apply here.
> > Hence my remark on the "arithmetics" since the total tally is slightly
> > "pro" although the PMC tally is slightly "con”.
>
> Where did I use the word “veto”? I never used the word “veto”.

I was trying to figure out how you reached your conclusion from the
page which you referred to (i.e. how a "-1" vote would be sufficient).

> There are essentially 3 ways to vote,
> Yes, No, and Abstain. In a procedural vote + or -1 represent an abstention. 
> Anything less than 0 is
> a No and anything greater is a Yes. So saying there were -1 votes implies 
> there are “No” votes and
> therefore there is no consensus.

Oliver reminded us that "[...] every committer can start a new
component [in the sandbox]".
Your interpration of the procedural vote seems to mean that
anyone else can prevent such an initiative.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-05 Thread Gilles Sadowski
Le mer. 5 mai 2021 à 20:33, Oliver Heger
 a écrit :
>
>
>
> Am 05.05.21 um 20:26 schrieb Gilles Sadowski:
> > Le mer. 5 mai 2021 à 18:57, Gary Gregory  a écrit :
> >>
> >> IMO the lack of +1s shows the lack of appetite to manage another component
> >
> > That's certainly true.
> > And nobody is forced to do anything.
> >
> > When the other CM spin-offs started, there was only _one_ person
> > willing to do the work.
>
> What about the sandbox? IIUC, every committer can start a new component
> there. If then a community forms around this component, it can move to
> proper (which would then require a vote).
>
> Would this be an option to get started?

[Graph] is listed in the sandbox[1], yet when someone expressed a willingness
to contribute, we had a "git" repository created[2] (even though the
web site has
remained outdated[3], probably because the attempt was short-lived).

So indeed, I could have already created the repository a few weeks ago...

However in this instance, what would it mean to have codes that have lived
within a "proper" component for 6 years and more be moved to "sandbox"?

Regards,
Gilles

[1] http://commons.apache.org/sandbox/commons-graph/
[2] https://gitbox.apache.org/repos/asf?p=commons-graph.git
[3] http://commons.apache.org/sandbox/commons-graph/source-repository.html

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-05 Thread Gilles Sadowski
Le mer. 5 mai 2021 à 18:57, Gary Gregory  a écrit :
>
> IMO the lack of +1s shows the lack of appetite to manage another component

That's certainly true.
And nobody is forced to do anything.

When the other CM spin-offs started, there was only _one_ person
willing to do the work.

Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-05 Thread Gilles Sadowski
Le mer. 5 mai 2021 à 17:44, Ralph Goers  a écrit :
>
>
>
> > On May 5, 2021, at 6:38 AM, Gilles Sadowski  wrote:
> >
> > Le mar. 4 mai 2021 à 02:49, Ralph Goers  a 
> > écrit :
> >>
> >> I apologize. I started another thread regarding the vote before seeing 
> >> this.
> >
> > No problem.
> >
> >> Maybe that will get more attention?
> >
> > It doesn't seem so. :-}
> >
> > IMHO, valid answers have been given to the statements/questions
> > from people who didn't vote +1.
> > The very low turnout makes the arithmetics of the result fairly 
> > subjective...
> >
> > The optimistic view is that
> >  1. most people don't care (that the repository is created),
> >  2. there is no reason to doubt the infos provided by actual users of
> > those codes,
> >  3. there is an embryo of a community (perhaps not viable, but only
> > the future can tell...),[1]
> >  4. the same kind of welcoming gestures should apply for the proposed
> > contributions, as for the attempt to resuscitate "Commons Graph"[2],
> > even if some of the PMC might arguably prefer another option.
>
> Regardless, following https://www.apache.org/foundation/voting.html 
> <https://www.apache.org/foundation/voting.html> indicates that this vote is 
> not going to pass.

How so?
[It's not about a code change; and no "technical argument" can be invoked.]

> You can’t assert lazy consensus on an explicit vote.  If you had started this 
> as a lazy consensus vote it
> is likely it would have still gotten a -1 vote since both Sebb and Emmanuel 
> have voice opposition.

A "veto" does not apply here.
Hence my remark on the "arithmetics" since the total tally is slightly
"pro" although the PMC tally is slightly "con".

Gilles

>
> Ralph
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Jacoco 0.8.7

2021-05-05 Thread Gilles Sadowski
Hi.

Le mer. 5 mai 2021 à 13:11, Melloware  a écrit :
>
> Jacoco 0.8.7 was released:

Thanks for the info.

>
> https://github.com/jacoco/jacoco/releases/tag/v0.8.7
>
> Any commons projects using Jacoco like BeanUtils2 should upgrade

It seems that this is usually done through "Commons Parent":
https://downloads.apache.org/commons/commons-parent/RELEASE-NOTES.txt

Gilles

> as it
> fixes JDK 15 and 16 issues.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-05 Thread Gilles Sadowski
Le mar. 4 mai 2021 à 02:49, Ralph Goers  a écrit :
>
> I apologize. I started another thread regarding the vote before seeing this.

No problem.

> Maybe that will get more attention?

It doesn't seem so. :-}

IMHO, valid answers have been given to the statements/questions
from people who didn't vote +1.
The very low turnout makes the arithmetics of the result fairly subjective...

The optimistic view is that
  1. most people don't care (that the repository is created),
  2. there is no reason to doubt the infos provided by actual users of
those codes,
  3. there is an embryo of a community (perhaps not viable, but only
the future can tell...),[1]
  4. the same kind of welcoming gestures should apply for the proposed
contributions, as for the attempt to resuscitate "Commons Graph"[2],
even if some of the PMC might arguably prefer another option.

Regards,
Gilles

[1] Three Java implementations of the SOFM turned up as the top results
of a web search; none seem to include multi-threading.
[2] https://gitbox.apache.org/repos/asf?p=commons-graph.git


>
> Ralph
>
> > On May 2, 2021, at 3:59 PM, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> >> [... Discussion about GA data-structures...]
> >
> > I'd suggest that we finalize the [Vote] before getting into the
> > details...
> >
> > Currently, there have been votes by:
> >  Emmanuel Bourg (-1)
> >  Sebastian Bazley (-0)
> >  Ralph Goers (+0)
> >  Paul King (+1)
> >
> > So currently, the discussion should be focused on settling to the
> > issues put forward by the opponents to having this new component:
> >  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
> >  * Problem 2: Who will contribute? (Ralph)
> >
> > Partial answers have been given.
> > We need more opinions (and votes).
> >
> > Regards,
> > Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-03 Thread Gilles Sadowski
Hello.

Le lun. 3 mai 2021 à 08:53, Avijit Basak  a écrit :
>
> Hi
>
>   I would like to vote for *commons-ml*.

Wrong thread, again.

Sorry for the nit-picking, but whenever a vote is requested, it is
often the basis of an official decision that must be traceable by
other parties, such as the ASF's INFRAstructure people.
In this case (the eventual creation of a repository, they might not
need to be involved, so I've voted on your behalf in the proper
thread (but, please, confirm by acknowledging, in that *other*
thread that the vote is according to your preference).

Thanks,
Gilles


>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create repository for "machine learning" algorithms.

2021-05-03 Thread Gilles Sadowski
Recording a vote in the proper thread on behalf of Avijit Basak (who
inadequately posted his vote in two other threads).

Le mer. 21 avr. 2021 à 19:05, Gilles Sadowski  a écrit :
>
> [...]
>
> Name of component: "Commons Machine Learning"
> Name of "git" repository: "commons-machinelearning"
> Top-level package name: "org.apache.commons.machinelearning"
>
> [...]
>
>
> Please vote:
 [X] Yes.
>   [ ] No, because ...

Gilles (on behalf on Avijit Basak)

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-02 Thread Gilles Sadowski
Hi.

> [... Discussion about GA data-structures...]

I'd suggest that we finalize the [Vote] before getting into the
details...

Currently, there have been votes by:
  Emmanuel Bourg (-1)
  Sebastian Bazley (-0)
  Ralph Goers (+0)
  Paul King (+1)

So currently, the discussion should be focused on settling to the
issues put forward by the opponents to having this new component:
  * Problem 1: Functionality should go somewhere else (Emmanuel, Sebb)
  * Problem 2: Who will contribute? (Ralph)

Partial answers have been given.
We need more opinions (and votes).

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-05-01 Thread Gilles Sadowski
Le ven. 30 avr. 2021 à 17:40, Avijit Basak  a écrit :
>
> Hi
>
>  >>lot of spurious references to "Commons Numbers"
>  --I have only created the basic project structure. Changes
> need to be made. Can anyone from the existing commons team help in doing
> this.

Wel, you should "search and replace":
  "Numbers" -> "Machine Learning"
  commons-numbers -> commons-machinelearning

Other things (repository URL, JIRA project name and URL) require that
a component be created (vote is pending).
[As long as those files are not part of a PR, it is not urgent to fix them.]

>  >> For sure, populate it with the code extracted from CM's
> "genetics"
> package and proceed with the enhancements.
> At first, I'd suggest to refactor the layout of the package (i.e. create
> a "subpackage" for each component of a genetic algorithm).
>   -- I am working on it.

Great!

> Did not commit the code till now.

OK.  When you do, please ask for review on the "dev" ML.

>   >>  Then some examination of the data-structures is required (a
> binary chromosome is currently stored as a "List").
>   -- I have recently done some work on this. Could you please
> check this article and share your thought.
>   "*https://arxiv.org/abs/2103.04751
> <https://arxiv.org/abs/2103.04751>*"

Alex already provided a thorough response.
It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
for a readily usable implementation of a "binary chromosome".

Do you think that allele sets other than binary would be useful to
implement? [IIUC your document above, it seems not (?).]

>   Are we thinking to use Spark for our parallelism

No, if the code is to reside in Commons.

> or a simple
> multi-threading of Java.

Yes, we'd depend only on JDK classes.

> I would prefer to use java multi-threading and
> avoid any other framework.
>   In java we don't have any library which can be used for AI/ML
> programming with a very minimal learning curve. Can we think of fulfilling
> this need?

That would be nice. Don't hesitate to enlist fellow programmers. :-)

Regards,
Gilles

>   This will be helpful for many java developers to venture into
> AI/ML without learning a new language like Python.
>
>
>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-30 Thread Gilles Sadowski
Le ven. 30 avr. 2021 à 18:00, Avijit Basak  a écrit :
>
> Hi
>
>  I would like to vote for *commons-ml*.

Wrong thread (the vote on this one has been cancelled due to being
idle for too long):  The new vote is there:
   https://markmail.org/message/g5gwof3qdkzyvedc

>>>  [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-29 Thread Gilles Sadowski
Le jeu. 29 avr. 2021 à 01:45, sebb  a écrit :
>
> On Thu, 29 Apr 2021 at 00:10, Gilles Sadowski  wrote:
> >
> > It occurs to me that we *should* create a specific "git" repository
> > for holding web site contents; having the "asf-site" and "asf-staging"
> > branches in the component's repository is looking for trouble: It will
> > be too easy to commit the (generated) web files into "master"
> > instead of the appropriate branch.  [If allowed (even recommended
> > as per the doc) by INFRA, we should not frown upon the increased
> > separation of concern (source code vs web site management).]
> >
> > "Logging" has one repository for the top-level site and a separate
> > repository for every component.
> > IMO, we should do the same (and copy their ".asf.yaml" layout).
>
> You are proposing about 50 new Git repos.

Only because it seems that the functionality was intended that way.
Also: Having independent repositories seems the safest path for
experimenting mix and match; if the latter works, not all components
will use the new system, or migration can be done gradually.

> > Until we make the git switch for the live top-level site, we would indeed
> > (as you proposed) not have a "publish" section in any of the ".asf.yaml"
> > files (in any of the repositories); we'd only use the "staging" section
> > that will make the site accessible at
> > https://commons.staged.apache.org
>
> The top-level site does NOT have to be switched to Git for this to work.
> As I already wrote we can mix SVN and Git.

I propose this in order to be able to test the *full* solution without
messing with the current setup, based on what you wrote previously:
That
https://commons.staged.apache.org
would go away.  [It's not: It will be used as the staging site through the
".asf.yaml" mechanism (cf. doc).]

> But of course the way the website is built needs to be changed to
> select the individual parts as already described.
> This means a change to the svnpubsub configuration.
>
> > Any objection to creating the following repositories:
> > commons-site.git
>
> -1: it's not needed; we can still use the SVN repo.
>
> > commons-math-site.git
> > ?
>
> Fine, but please try (and document) the full process of how to stage
> the site and how to push the staged site to the asf-site branch.

Practical question: Do we care about getting lots commit messages
sent to the commits@ ML during the test phase?
Or should I direct the traffic to some other list (which one?) in the
meantime

> There's no point converting to Git if that process is more involved
> than the existing process.

I'm not sure that we mean the same with "the existing process".
Earlier in the thread, I've described what I do:

$ mvn site site:stage
$ cd site-content
$ rm -rf *
$ cd ../target/staging/
$ cp -r * ../../site-content
$ cd ../../site-content
$ svn status
[Use some commands to "svn add" all the new files and "svn del"
to remove all the file that do not exist anymore.]
$ svn commit

What I'd like to know is whether the "process" should be different
with the current setup.

IIUC, the ".asf.yaml" approach is to create a subdirectory for each
new version of the web site (in sync with versions of the code).

So that the last two steps of the "process" above would just be (within
a newly created subdirectory).
$ git add -A
$ git commit

Gilles

> > > > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-28 Thread Gilles Sadowski
It occurs to me that we *should* create a specific "git" repository
for holding web site contents; having the "asf-site" and "asf-staging"
branches in the component's repository is looking for trouble: It will
be too easy to commit the (generated) web files into "master"
instead of the appropriate branch.  [If allowed (even recommended
as per the doc) by INFRA, we should not frown upon the increased
separation of concern (source code vs web site management).]

"Logging" has one repository for the top-level site and a separate
repository for every component.
IMO, we should do the same (and copy their ".asf.yaml" layout).

Until we make the git switch for the live top-level site, we would indeed
(as you proposed) not have a "publish" section in any of the ".asf.yaml"
files (in any of the repositories); we'd only use the "staging" section
that will make the site accessible at
https://commons.staged.apache.org

Any objection to creating the following repositories:
commons-site.git
commons-math-site.git
?

Gilles




Le mer. 28 avr. 2021 à 00:39, sebb  a écrit :
>
> On Tue, 27 Apr 2021 at 17:03, Ralph Goers  wrote:
> >
> >
> >
> > > On Apr 27, 2021, at 6:57 AM, Gilles Sadowski  wrote:
> > >
> > > Le mar. 27 avr. 2021 à 12:32, sebb  > > <mailto:seb...@gmail.com>> a écrit :
> > >>
> > >> On Tue, 27 Apr 2021 at 02:10, Gilles Sadowski  
> > >> wrote:
> > >>>
> > >>>>>> [...]
> > >>>>>
> > >>>>> OK to create the
> > >>>>>commons-site
> > >>>>> "git" repository?
> > >>>>
> > >>>> Are you offering to do the work?
> > >>>
> > >>> If the option is still on the table, I could test the
> > >>> website-related feature of ".asf.yaml":
> > >>>
> > >>> https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#git.asf.yamlfeatures-BranchProtection
> > >>
> > >> Please do NOT attempt to use the 'publish' feature.
> > >>
> > >> As I already wrote that will likely mangle the current website and may
> > >> require Infa assistance to untangle.
> > >
> > > I certainly do not want to do that.
> > >
> > >> Removal of a publish entry from .asf.yaml does not undo the checkout,
> > >> and only Infra have direct access to the TLP server.
> > >
> > > Alas, there is a limit to INFRA's magics... ;-)
> > >
> > >> However, you could experiment with the 'staging' feature, and see how
> > >> easy it is to publish the site to the asf-site branch.
> > >
> > > I must be missing something because I don't see what there is
> > > to do then, apart from
> > >  $ git checkout asf-site
> > >  $ mvn site site:stage
> > >
> > > And, just as now, the functional (except perhaps for the links to
> > > the top-level Commons site) static site will be under
> > >  target/staging
> > >
> >
> > ASF git-based web sites use two branches; asf-site for the live site and 
> > asf-staging for the
> > taged site. So when Sebb is telling you to work on the staging site only he 
> > means commit
> > only to the asf-staging branch.
> >
> >
> > >>
> > >> Just don't attempt to publish that branch.
> > >>
> > >>>>
> > >>>> BTW, I have found out that it is possible to combine site content from
> > >>>> SVN and Git repos in order to create the website checkout.
> > >>>> So there is no need to convert to Git.
> > >>>
> > >>> Is it the solution straightforwardly applicable to the current
> > >>> setup of the Commons web site? [So that ".asf.yaml" should
> > >>> not be used for the projects' sites.]
> > >>
> > >> AFAICT, yes.
> > >>
> > >> The website is currently taken from:
> > >>
> > >> https://svn-master.apache.org/repos/infra/websites/production/commons
> > >>
> > >> This is done as a single checkout.
> > >>
> > >> This could be changed to take the top-level website from its own
> > >> location, and the dormant, sandbox and proper trees could be checked
> > >> out into the relevant subdirectories.
> > >>
> > >> This should be fine so long as the top-level site does no

Re: The case for a Commons component

2021-04-28 Thread Gilles Sadowski
Le lun. 26 avr. 2021 à 16:18, Avijit Basak  a écrit :
>
> Hi
>
> As per previous discussions, I have created a temporary repository
> in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been
> copied from commons-numbers. A preliminary structure has been created for
> the proposed component.
> Please let me know if we want to proceed with this format.

There is no source code (and a lot of spurious references to
"Commons Numbers").
For sure, populate it with the code extracted from CM's "genetics"
package and proceed with the enhancements.
At first, I'd suggest to refactor the layout of the package (i.e. create
a "subpackage" for each component of a genetic algorithm).
Then some examination of the data-structures is required (a binary
chromosome is currently stored as a "List").
Shouldn't the whole design be revised (based on interfaces and
streams)?

> We can copy the
> same to any other team repository if required.

That would be a repository on an ASF server, once the pending vote
process is completed.  [By the way: You didn't vote...]

Regards,
Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-27 Thread Gilles Sadowski
Le mar. 27 avr. 2021 à 12:32, sebb  a écrit :
>
> On Tue, 27 Apr 2021 at 02:10, Gilles Sadowski  wrote:
> >
> > >>> [...]
> > > >
> > > > OK to create the
> > > > commons-site
> > > > "git" repository?
> > >
> > > Are you offering to do the work?
> >
> > If the option is still on the table, I could test the
> > website-related feature of ".asf.yaml":
> > 
> > https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#git.asf.yamlfeatures-BranchProtection
>
> Please do NOT attempt to use the 'publish' feature.
>
> As I already wrote that will likely mangle the current website and may
> require Infa assistance to untangle.

I certainly do not want to do that.

> Removal of a publish entry from .asf.yaml does not undo the checkout,
> and only Infra have direct access to the TLP server.

Alas, there is a limit to INFRA's magics... ;-)

> However, you could experiment with the 'staging' feature, and see how
> easy it is to publish the site to the asf-site branch.

I must be missing something because I don't see what there is
to do then, apart from
  $ git checkout asf-site
  $ mvn site site:stage

And, just as now, the functional (except perhaps for the links to
the top-level Commons site) static site will be under
  target/staging

>
> Just don't attempt to publish that branch.
>
> > >
> > > BTW, I have found out that it is possible to combine site content from
> > > SVN and Git repos in order to create the website checkout.
> > > So there is no need to convert to Git.
> >
> > Is it the solution straightforwardly applicable to the current
> > setup of the Commons web site? [So that ".asf.yaml" should
> > not be used for the projects' sites.]
>
> AFAICT, yes.
>
> The website is currently taken from:
>
> https://svn-master.apache.org/repos/infra/websites/production/commons
>
> This is done as a single checkout.
>
> This could be changed to take the top-level website from its own
> location, and the dormant, sandbox and proper trees could be checked
> out into the relevant subdirectories.
>
> This should be fine so long as the top-level site does not have any
> files in those 3 subdirectories.
>
> For an example of this, go to
> https://infra-reports.apache.org/site-source/
> and enter 'ant.apache.org' in the search box 'Find a web site'
>
> You should see 4 entries at different levels derived from different SVN URLs.
> Note that in each case the parent SVN tree does not have an entry for
> its children.
> E.g. The SVN location for ant.apache.org does not have a directory
> easyant or ivy.
>
> Note that .asf.yaml does not apply to SVN checkouts.
>
> If we were to determine that Git was better for the proper websites,

How to do that without a playground for testing?

> we would need to ensure that the existing site is removed from SVN
> before trying to replace it with one from Git.

That's why we should be able to test the full set of options.  Is it
possible to "publish" to some place that would not interfere with
the current setup (and its adapting to INFRA's requirements)?

Could we perhaps use the "subdir" feature (see below the copy/paste
of INFRA's instructions) in order to publish to an alternative page:

https://commons.apache.org
  -> current top-level (still using SVN)
https://commons.apache.org/git-site-commons
  -> alternative top-level (testing ".asf.yaml", publishing to given "sudir")
https://commons.apache.org/math
  -> current site for component (using SVN)
https://commons.apache.org/git-site-math
  -> component's alternative site
If the git top-level site is not created yet (due to lack of time or deciding
it's not worth it), links from the alternative site would point to the current
top-level, but that would still allow each component to independently test
(at its own pace) the move to "git".

> > >
> > > ==
> > >
> > > If there is a desire to use Git for the component websites, I suggest
> > > you try creating a couple of branches in the commons-math repository:
> > > asf-staging and asf-site.
> > >
> > > See if it is easy to create the staging site and then commit it to the
> > > asf-site branch.
> >
> > Did I misunderstand that the above documentation assumes a
> > dedicated "git" repository for the web site's contents?
>
> Not sure what documentation you are referring to.

>From the link above, here is the part that concerns web sites
publication:
---CUT---

Web Site Deployment Service for Git Repositories

The staging and publish features of the .asf.y

Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-26 Thread Gilles Sadowski
>>> [...]
> >
> > OK to create the
> > commons-site
> > "git" repository?
>
> Are you offering to do the work?

If the option is still on the table, I could test the
website-related feature of ".asf.yaml":

https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#git.asf.yamlfeatures-BranchProtection

>
> BTW, I have found out that it is possible to combine site content from
> SVN and Git repos in order to create the website checkout.
> So there is no need to convert to Git.

Is it the solution straightforwardly applicable to the current
setup of the Commons web site? [So that ".asf.yaml" should
not be used for the projects' sites.]

>
> ==
>
> If there is a desire to use Git for the component websites, I suggest
> you try creating a couple of branches in the commons-math repository:
> asf-staging and asf-site.
>
> See if it is easy to create the staging site and then commit it to the
> asf-site branch.

Did I misunderstand that the above documentation assumes a
dedicated "git" repository for the web site's contents?
Or is this suggested only for an overhaul of how the sites are
built?  I.e. do we need/want something like
https://gitbox.apache.org/repos/asf?p=logging-site.git
https://gitbox.apache.org/repos/asf?p=logging-log4j-site.git;a=tree
or not?

Strangely, there is something there:
https://commons.staged.apache.org/

>
> I suspect it won't be significantly easier than the current process.
>
> However do *not* publish the asf-site branch as that will likely mess
> up the commons website; this may require Infra involvement to recover
> things.
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-26 Thread Gilles Sadowski
Le lun. 26 avr. 2021 à 17:08, Ralph Goers  a écrit :
>
> See below
>
> > On Apr 18, 2021, at 3:21 PM, sebb  wrote:
> >
> > On Sun, 18 Apr 2021 at 18:40, Gilles Sadowski  > <mailto:gillese...@gmail.com>> wrote:
> >>
> >> Le dim. 18 avr. 2021 à 15:38, sebb  a écrit :
> >>>
> >>> On Sun, 18 Apr 2021 at 13:40, Gary Gregory  wrote:
> >>>>
> >>>> Note that git also has its gitlink and sub modules features that we could
> >>>> use here.
> >>>
> >>> Are they easy to use?
> >>> Who is going to design and test the replacement?
> >>> Will such a design really be easier to use?
> >>> There's no point changing the publication strategy if it is not an 
> >>> improvement.
> >>
> >> Quoting Ralph Goers:
> >> ---CUT---
> >> When I release Log4j I rum mvn site followed by "mvn site:stage
> >> -DstagingDirectory=$HOME/log4j” on my laptop. I validate the site
> >> locally and then zip the site, cd to my logging-log4j-site project and
> >> unzip it where I want it to go.
> >> ---CUT---
> >>
> >> Is that the "publication strategy" which you think is not worth
> >> changing to?
> >>
> >> That's not more complicated than what I do now (mentioned in the
> >> other thread).
> >
> > AFAIK the steps you mention in the other thread can be replaced by:
> >
> > $ mvn clean site-deploy # for single module components
> > OR
> > $ mvn clean site site:stage scm-publish:publish-scm # multi module
> >
> > I'm not sure that the proposed method is no more complicated than the
> > present arrangements.
> >
> > The proposal would be two local workspaces to maintain, and two repos
> > for each component.
> >
> > There's also the issue that most of the poms would likely need
> > changing, and the change would not be as simple as changing a URL.
>
> If you use "mvn site:stage -DstagingDirectory=wherever/my/local/site/is” then 
> you don’t need to change the poms.
>
> >
> > As well as setting up all the extra Git branches and/or repos.
> >
> > I don't know if a website can be served from a combination of SVN and
> > Git sources, so the top-level website would need to be converted to
> > Git, and something done about the dormant and sandbox sites - probably
> > would need at least one more Git repo to hold them.
>
> Why wouldn’t the dormant and sandbox sites just be part of the main web site?
>
> >
> > The only advantage I can see is that there could be a public staging
> > repo for each site.
> >
> > Is that worth all the extra setup?
> >
> > And who is doing the work?
>
> Well, someone will have to volunteer.

OK to create the
commons-site
"git" repository?

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-04-26 Thread Gilles Sadowski
Le lun. 26 avr. 2021 à 17:08, Ralph Goers  a écrit :
>
> How many committers will be active for this component?

No less than there were for [RNG], [Numbers] and [Geometry]. ;-)

Those new components have attracted high-quality contributions;
two of the people who provided them have become committers.

Gilles

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-04-26 Thread Gilles Sadowski
Le dim. 25 avr. 2021 à 16:27, sebb  a écrit :
>
> I assume this thread is about the possible ML component.

I hesitated with Subject: "The case for *any* Commons component".

> If the code was developed by Commons, I assume it could be used as
> part of Spark.
> However Commons does not currently have many developers who are
> familiar with the field.
> So it would seem to me better to have development done by a project
> which does have relevant experience.

I expressed the same concern/opinion; in fact, if I were tempted
to implement something of the like now, I would probably indeed
start experimenting with Spark. [CM's implementation of SOFM
dates from early 2014.]

On the other hand, several people (at different times) expressed
an interest of having such codes free of the "high-level" features
that come with the "platforms".
My own current usage of the "neuralnet" package does not
warrant a move to Spark.
I'm also interested in refactoring the "clustering" package (but will
not pursue it alone).

> You say that Spark etc have lots of jars.
> Surely that allows for it to be implemented as a separate jar which
> can either be used as part of the Spark platform, or used
> independently?

https://spark.apache.org/docs/latest/spark-standalone.html

TL;DR; but there are many references to a "cluster", so that seems
the common use-case, while code here could for example focus on
multi-thread-ready components, primarily targetting applications that
run in a single multi-core machine).

> The only other option I see is for Commons to persuade some developers
> who are familiar with the field to join Commons to assist with the
> algorithms.
> Existing Commons developers can help manage the logistics of packaging
> and releasing the code, as this does not require in depth knowledge of
> the design.
> However this only makes sense if the developers skilled in the are are
> prepared to assist long-term.

I try to make that crystal-clear to every new contributor (cf. proposal to
revive "Commons Graph", the exchange on refactoring  the "clustering"
package, the necessary features for a GA implementation that purports
to be more than a toy example, ...).

However, it is obviously impossible to enforce something as "prepared
to assist long-term"; it is rightfully a necessary condition for being
granted commit access, but it's up to the project to create a "place"
where people want to stay (and know what to expect).
For people interested in "ML" (not necessarily experts: They could be
developers willing to implement standard algorithms, as we did in CM),
it means that there should be global guidelines (like there were for CM)
such as e.g. "multi-thread-ready" (in addition to the usual "full doc",
"full coverage", etc.), and a repository for those codes.

We don't have much grasp on the arrival rate of contributors but I
contend that a component with a specific scope is much more
appealing (especially to newcomers) than a mixed bag à la CM
which nobody here is able (or willing) to maintain (and the reason
why I'll only merge bug-fixes).

Not creating the "place" will of course pave the way to a self-fulfilling
prophecy.

Gilles

> On Sat, 24 Apr 2021 at 23:32, Paul King  wrote:
> >
> > Thanks Gilles,
> >
> > I can provide the same sort of stats across a clustering example
> > across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> > Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> > would no doubt lead to similar conclusions.
> >
> > Cheers, Paul.
> >
> > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski  
> > wrote:
> > >
> > > Hello Paul.
> > >
> > > Le sam. 24 avr. 2021 à 04:42, Paul King  a 
> > > écrit :
> > > >
> > > > I added some more comments relevant to if the proposed algorithm
> > > > belongs somewhere in the commons "math" area back in the Jira:
> > > >
> > > > https://issues.apache.org/jira/browse/MATH-1563
> > >
> > > Thanks for a "real" user's testimony.
> > >
> > > As the ML is still the official forum for such a discussion, I'm quoting
> > > part of your post on JIRA:
> > > ---CUT---
> > > For linear regression, taking just one example dataset, commons-math
> > > is a couple of library calls for a single 2M library and solves the
> > > problem in 240ms. Both Ignite and Spark involve "firing up the
> > > platform" and the code is more complex for simple scenarios. Spark has
> > > a 181M footprint across 210 jars and solves the problem in about 20s.
> 

Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-26 Thread Gilles Sadowski
Hi.

[Not replying to the previously last message in the thread but a
heads-up following Bruno's reminder about ".asf.yaml" and Ralph's
mention of it earlier in this thread.]

Could we start from the bottom-up, i.e. create a git repository for
hosting each component's web site?
Then we can gradually adapt the links of the top-level Commons
site to point to the new locations.

Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: The case for a Commons component

2021-04-25 Thread Gilles Sadowski
Le dim. 25 avr. 2021 à 00:32, Paul King  a écrit :
>
> Thanks Gilles,
>
> I can provide the same sort of stats across a clustering example
> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> would no doubt lead to similar conclusions.

There also were relatively recent discussions concerning the codes in
the "o.a.c.m.ml.clustering" package.[1]
If they are useful as of the old CM v3.6.1, they can very probably be
improved upon in terms of flexibilty[2] and performance through (a.o.
things) multi-threading (in much the same way as for GA, I guess).

Best regards,
Gilles

[1] https://issues.apache.org/jira/browse/MATH-1515
[2] Fixes and enhancements are already in CM "master" branch.

>
> Cheers, Paul.
>
> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski  wrote:
> >
> > Hello Paul.
> >
> > Le sam. 24 avr. 2021 à 04:42, Paul King  a écrit 
> > :
> > >
> > > I added some more comments relevant to if the proposed algorithm
> > > belongs somewhere in the commons "math" area back in the Jira:
> > >
> > > https://issues.apache.org/jira/browse/MATH-1563
> >
> > Thanks for a "real" user's testimony.
> >
> > As the ML is still the official forum for such a discussion, I'm quoting
> > part of your post on JIRA:
> > ---CUT---
> > For linear regression, taking just one example dataset, commons-math
> > is a couple of library calls for a single 2M library and solves the
> > problem in 240ms. Both Ignite and Spark involve "firing up the
> > platform" and the code is more complex for simple scenarios. Spark has
> > a 181M footprint across 210 jars and solves the problem in about 20s.
> > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > 40s. But I can also find more complex scenarios which need to scale
> > where Ignite and Spark really come into their own.
> > ---CUT---
> >
> > A similar rationale was behind my developing/using the SOFM
> > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > proof of concept, and taking the "lightweight" path seemed more
> > effective than experimenting with those platforms.
> > Admittingly, at that epoch, there were people around, who were
> > maintaining the clustering and GA codes; hence, the prototyping
> > of a machine-learning library didn't look strange to anyone.
> >
> > Regards,
> > Gilles
> >
> > >>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [RNG][Geometry] TriangleSampler and other shape samplers

2021-04-24 Thread Gilles Sadowski
Le sam. 24 avr. 2021 à 23:32, Alex Herbert  a écrit :
>
> On Sat, 24 Apr 2021 at 17:36, Matt Juntunen 
> wrote:
>
> > This is very interesting. With this amount of geometry code, I could
> > picture this as part of a commons-geometry-rng module. The SurfaceSampler,
> > for example, could make use of the TriangleMesh and Triangle3D classes.
> > This would also be an opportunity to expand support for N-dimensional
> > points and vectors.
> >
>
> commons-geometry-sampling?

+1 for a new module.

>
> I like the idea of providing geometry and as a framework for validating the
> shapes that are input.

A new module would indeed make it possible to have it depend on
(some part of) [Geometry], while not imposing those dependencies
on users of the low(er)-level sampling utilities.

> Anyway I suggest to move the UnitBallSampler to a o.a.c.rng.sampling.shape
> package and to start adding items there. When the samplers are done then
> the package can be reviewed and the options with regard to geometry can
> be discussed, e.g. a geometry module using it, or the functionality being
> moved to geometry.

I don't think we'd ever regret having an additional module.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



The case for a Commons component

2021-04-24 Thread Gilles Sadowski
Hello Paul.

Le sam. 24 avr. 2021 à 04:42, Paul King  a écrit :
>
> I added some more comments relevant to if the proposed algorithm
> belongs somewhere in the commons "math" area back in the Jira:
>
> https://issues.apache.org/jira/browse/MATH-1563

Thanks for a "real" user's testimony.

As the ML is still the official forum for such a discussion, I'm quoting
part of your post on JIRA:
---CUT---
For linear regression, taking just one example dataset, commons-math
is a couple of library calls for a single 2M library and solves the
problem in 240ms. Both Ignite and Spark involve "firing up the
platform" and the code is more complex for simple scenarios. Spark has
a 181M footprint across 210 jars and solves the problem in about 20s.
Ignite has a 87M footprint across 85 jars and solves the problem in >
40s. But I can also find more complex scenarios which need to scale
where Ignite and Spark really come into their own.
---CUT---

A similar rationale was behind my developing/using the SOFM
functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
proof of concept, and taking the "lightweight" path seemed more
effective than experimenting with those platforms.
Admittingly, at that epoch, there were people around, who were
maintaining the clustering and GA codes; hence, the prototyping
of a machine-learning library didn't look strange to anyone.

Regards,
Gilles

>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [RNG][Geometry] TriangleSampler and other shape samplers

2021-04-23 Thread Gilles Sadowski
Le ven. 23 avr. 2021 à 23:42, Alex Herbert  a écrit :
>
> I recently added a UnitBallSampler to the sampling module to sample
> coordinates inside a unit ball. I also have a working TriangleSampler to
> sample within a triangle and intend to create a TetrahedronSampler to
> sample within a tetrahedron.
>
> Currently in the released version (1.3) we only have a UnitSphereSampler in:
>
> o.a.c.rng.sampling
>
> The only other package is o.a.c.rng.sampling.distribution for probability
> distributions.
>
> Should new coordinate based samplers be moved to a package inside for
> example:
>
> o.a.c.rng.sampling.geometry
> o.a.c.rng.sampling.shape

Maybe.  I guess that "UnitSphereSampler" would be moved too (and thus
deprecated in the package where it currently resides).

Do you foresee many "shapes"?
And a way to "combine" them?

> These shape samplers also require a valid input for the geometry. Currently
> in my working example for the TriangleSampler I have not validated the
> input is a triangle. I state that if the points are collinear then the
> distribution of the samples is undefined. It will not be uniform on the
> line segment connecting the vertices.
>
> I think that validation of the input shape is out of scope.

I'd tend to agree; otherwise, it would duplicate code that would certainly
also be needed in [Geometry].

> Are shape
> samplers themselves also out of scope for RNG and would be a better fit in
> commons Geometry?

At first sight, sampling codes would seem more out of scope in [Geometry]
than "shape" sampling in [RNG].


Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[Vote] Create repository for "machine learning" algorithms.

2021-04-21 Thread Gilles Sadowski
Hi.

[This a reboot of the proposal for which the preceding vote
has just been cancelled.]

Name of component: "Commons Machine Learning"
Name of "git" repository: "commons-machinelearning"
Top-level package name: "org.apache.commons.machinelearning"

Component rationale and scope (copied verbatim from the
OP in the other thread):
---CUT---
Because of an offered contribution, a discussion happened on
JIRA[1] and in another thread[2] about improving the genetic
algorithm (GA) implementation currently in the
   org.apache.commons.math4.genetic
package of the "Commons Math" component.
It would make sense to group "machine learning" algorithms[3]
(to which GA belongs) within a single component, where codes from
  org.apache.commons.math4.ml.neuralnet
  org.apache.commons.math4.ml.clustering
would be moved too.
This would be the fifth (and last) component resulting from my proposal
(see e.g. [4] among other threads) for the reorganization of the "Commons
Math"[5] code base into more maintainable components[6][7][8][9], each
focused on actually related functionalities (thus *not* the wide expertise
necessary for the maintenance of a full-fledged math library).
---CUT---

You might want to read the discussion that proceeded from the
previous vote request. [10]

Please vote:
  [ ] Yes.
  [ ] No, because ...

Thanks,
Gilles

[1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1563
[2] https://markmail.org/message/dnujdcxuaq5bwuwe
[3] https://en.wikipedia.org/wiki/Machine_learning
[4] https://markmail.org/message/75vuyhzblfadc5op
[5] http://commons.apache.org/proper/commons-math/
[6] http://commons.apache.org/proper/commons-rng/
[7] http://commons.apache.org/proper/commons-numbers/
[8] http://commons.apache.org/proper/commons-geometry/
[9] http://commons.apache.org/proper/commons-statistics/
[10] https://markmail.org/message/m7cyn4aq2rg47jxk

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[Cancel][Vote] Create a "machine learning" component

2021-04-21 Thread Gilles Sadowski
>>> [...]
> >
> > So currently, IIRC the tally (on creating a dedicated component) is
> >  Gilles Sadowski +1
> >  Avijit Basak +1
> >  Paul King +1
> > And several -1 on the initially suggested name; but the proposed
> > name has been changed early on to "commons-machinelearning"
> > (in order to comply with Commons' tradition of full words and
> > descriptive names).
> > [Please correct if it doesn't reflect what has been expressed.]
> >
> > Where does that lead us?
>
> With a vote thread that has been open for over 2 months that apparently 
> should have been a discussion thread. I would suggest you cancel this vote 
> and create a new Vote thread proposing commons-machinelearning.

Stopping this thread as a [vote].

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-21 Thread Gilles Sadowski
Le mer. 21 avr. 2021 à 08:56, Paul King  a écrit :
>
> On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers  
> wrote:
> >
> > Why are y’all having a long discussion on Vote thread?

Paul King's comments is interesting information that could
bear on people's decision on the proposal (especially the
licence's issue).
As for the question of whether the purported functionality would
find a better home elsewhere with the ASF, I'm sure what would
be the conclusion (apart from Avijit Bask's plain preference (?) to
develop a standalone component, as per Commons' requirement).

>
> Fair enough. I am +1 (non-binding).

So currently, IIRC the tally (on creating a dedicated component) is
  Gilles Sadowski +1
  Avijit Basak +1
  Paul King +1
And several -1 on the initially suggested name; but the proposed
name has been changed early on to "commons-machinelearning"
(in order to comply with Commons' tradition of full words and
descriptive names).
[Please correct if it doesn't reflect what has been expressed.]

Where does that lead us?

Regards,
Gilles

>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-20 Thread Gilles Sadowski
Le mar. 20 avr. 2021 à 16:09, Avijit Basak  a écrit :
>
> Hi
>
>   > Did you ask "Spark" people about their opinion about it?
> -- Not yet. I am not sure what would be the right option for
> this communication. It will be good if you can approach them.

You are the one who proposes a functionality that might be of interest
to the "Spark" project, perhaps on some condition on their part which
*you* are going to have to accept (or not).

In other words: It would be useless that *I* go and tell them there exist
some code in Commons Math which they could take an adapt for their
project (they can always do that).
What might be of value to them (as to the Commons project, too), is a
contributor willing to do the necessary work to create or improve a
community-supported feature.

>   > where it can be used in real-life (performance-wise)
> applications, then you should demonstrate it
> -- Do we have any kind of performance benchmark or use case
> regarding this?

Please assume that *you* are the person with the most GA expertise
in this forum.
There certainly are unit tests for the GA functionality, but I don't think
there are benchmarks; certainly, one task would be to set up a module
for (JMH-based) experimentation.

> Once that is decided,

One mantra of ASF communities is that "those who do the work get
to decide".
[The PMC can decide (by vote) whether to accept a new component;
but it's up to you to show that it's worth it (with the risk that the PMC
won't accurately judge the contribution, unfortunately)...]

> then I can proceed with this.

There is already a long list of things that can be done.

You don't *have* to contact "Spark" if you don't feel that it's the
right project for your work.  You could just hope for the best, and
start somewhere else (modularization of Commons Math, a fork
on GitHub of of CM ML-related codes, and so on).

The one thing which I won't be helping with is merging ad-hoc
GA-related changes into the current CM codebase.
This doesn't preclude that other committers might want to do that
for you; however judging by the last 5 years, I wouldn't count too
much on it. ;-)

Regards,
Gilles

>
>
> Thanks & Regards
> --Avijit Basak
>
> On Mon, 19 Apr 2021 at 18:51, Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Le lun. 19 avr. 2021 à 08:35, Avijit Basak  a
> > écrit :
> > >
> > > Hi
> > >
> > > >Isn't a GA inherently parallel?
> > > >If so, why not take advantage of the concurrency tools provided by the
> > JDK?
> > >   -- Are we planning to implement multi-threading for GA operations even
> > as
> > > part of a single population
> >
> > This seems an obvious improvement to our current implementation
> > (in case a chromosome's evaluation is not population-dependent).
> >
> > > or only for multi-population parallel GA.
> > >   -- We can implement different types of co-evolution as part of parallel
> > > GA. Need to decide on the corresponding strategies we are going to
> > > incorporate.
> >
> > The discussion is still about the "administrative" question of whether
> > any of this should be implemented in the "Commons" project...
> >
> > Did you ask "Spark" people about their opinion about it?
> >
> > As I said, if you are confident that you can bring our implementation to
> > a state where it can be used in real-life (performance-wise) applications,
> > then you should demonstrate it (in order to convince other people from
> > the Commons PMC that it is worth engaging in long-term maintenance).
> > AFAICT, a way to do it would be to create a GitHub project (aimed at
> > becoming a new "machine learning" component, or a maven/JPMS
> > module within Commons Math).
> >
> > Best regards,
> > Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [geometry] 1.0-beta2

2021-04-20 Thread Gilles Sadowski
Hi Alex.

Le mar. 20 avr. 2021 à 11:17, Alex Herbert  a écrit :
>
> [...]

I'm a bit lost in all these bits... ;-)

> Any opinions on how changing LinearComination may affect Geometry? Either
> we clean up the implementation to use the fast dot2s algorithm with correct
> support for splitting 53-bit mantissas, or switch to the extended precision
> version. But I do not think we should leave it as the current
> implementation which has disadvantages against either of the alternatives.

What is your suggestion?

My impression is that [Geometry] emphasizes accuracy over ultimate
speed (as opposed to libraries used for real-time rendering, I guess).

However, could it be possible to leave this as a user's decision?
Quoting from Matt's tutorial:
---CUT---
Typically, users of Commons Geometry will construct a single instance
of this type for use by multiple objects throughout an entire
operation, or even application. Since we don't want our class to
assume such a heavy responsibility, we will simply accept a
DoublePrecisionContext in the constructor.
---CUT---
Would it be conceivable that the choice of the implementation
activated by a call to the "LinearCombination" facility is also
encapsulated in the "DoublePrecisionContext"?

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [geometry] 1.0-beta2

2021-04-19 Thread Gilles Sadowski
Le lun. 19 avr. 2021 à 20:26, Matt Juntunen
 a écrit :
>
> Hi Gilles,
>
> Are you suggesting skipping another beta version and having numbers 1.0 and 
> geometry 1.0 be the next releases?

Yes, that's the question which I'm asking.
There is no point in waiting much longer for feedback that
may never come...
Of course, we aim for the perfect design but if we get it wrong
and must evolve the library in a non-compatible way, all that
will happen is that the base package will change name.

> I could get on board with that. It would be great to have an
> official release of these.

Well, it's up to the main contributor(s) to let us know when it
seems that the design is good enough given the knowledge
shared within the current community.

> What needs to be done on numbers before we're ready for
> 1.0 (aside from moving over some code from geometry)?

The most basic utilities haven't fundamentally changed.  It
will be nice to increase the visibility of the many consistency
and performance improvements.
Modules to perhaps be left out are also TBD (in another thread).

>
> On another note, I don't feel like the enclosing and hull
> modules in geometry are quite ready for prime time yet.
> So, I would leave those out of 1.0.

That would be quite fine, I think.

Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [geometry] 1.0-beta2

2021-04-19 Thread Gilles Sadowski
Hello Matt.

Le lun. 19 avr. 2021 à 15:20, Matt Juntunen
 a écrit :
>
> I'd like to release commons-geometry 1.0-beta2 within the next couple of 
> weeks. I'm planning on including GEOMETRY-118 (additional methods for 
> transform matrix classes) in that if possible. Is there anything else anyone 
> would like to include?

Thanks a lot for your work on [Geometry].

Could we perhaps make progress with the issue of what code can
be moved to [Numbers]?
Rationale is that some project might be unwilling to allow "beta" code
in its dependencies (if just because there is the risk JAR hell, which
we explicitly permit in beta releases).  Since it isn't likely that additional
feedback will come about the beta components, it would be good to
plan for an "official" release of [Numbers] and [Geometry], in that order.
[It's not mandatory to include all modules.]

WDYT?

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-19 Thread Gilles Sadowski
Hello.

Le lun. 19 avr. 2021 à 08:35, Avijit Basak  a écrit :
>
> Hi
>
> >Isn't a GA inherently parallel?
> >If so, why not take advantage of the concurrency tools provided by the JDK?
>   -- Are we planning to implement multi-threading for GA operations even as
> part of a single population

This seems an obvious improvement to our current implementation
(in case a chromosome's evaluation is not population-dependent).

> or only for multi-population parallel GA.
>   -- We can implement different types of co-evolution as part of parallel
> GA. Need to decide on the corresponding strategies we are going to
> incorporate.

The discussion is still about the "administrative" question of whether
any of this should be implemented in the "Commons" project...

Did you ask "Spark" people about their opinion about it?

As I said, if you are confident that you can bring our implementation to
a state where it can be used in real-life (performance-wise) applications,
then you should demonstrate it (in order to convince other people from
the Commons PMC that it is worth engaging in long-term maintenance).
AFAICT, a way to do it would be to create a GitHub project (aimed at
becoming a new "machine learning" component, or a maven/JPMS
module within Commons Math).

Best regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-18 Thread Gilles Sadowski
Le dim. 18 avr. 2021 à 15:38, sebb  a écrit :
>
> On Sun, 18 Apr 2021 at 13:40, Gary Gregory  wrote:
> >
> > Note that git also has its gitlink and sub modules features that we could
> > use here.
>
> Are they easy to use?
> Who is going to design and test the replacement?
> Will such a design really be easier to use?
> There's no point changing the publication strategy if it is not an 
> improvement.

Quoting Ralph Goers:
---CUT---
When I release Log4j I rum mvn site followed by "mvn site:stage
-DstagingDirectory=$HOME/log4j” on my laptop. I validate the site
locally and then zip the site, cd to my logging-log4j-site project and
unzip it where I want it to go.
---CUT---

Is that the "publication strategy" which you think is not worth
changing to?

That's not more complicated than what I do now (mentioned in the
other thread).
IIUC, the "zip" step could be skipped altogether by setting the
"staginDirectory" directly to be the site reporsitory (?).

Gilles

> We do at least have a way forward if Infra insist on removing
> websites/production.
> Simple to implement, but tedious, as nearly every proper component POM
> will need updating, and existing checkouts will need replacing.
> At least it's a one-off change, and it won't change processes, except
> perhaps for the top-level site.
>
>
> > Gary
> >
> >
> > On Sun, Apr 18, 2021, 08:27 Gilles Sadowski  wrote:
> >
> > > Le dim. 18 avr. 2021 à 12:51, sebb  a écrit :
> > > >
> > > > On Sun, 18 Apr 2021 at 00:03, Ralph Goers 
> > > wrote:
> > > > >
> > > > >
> > > > >
> > > > > > On Apr 17, 2021, at 3:32 PM, sebb  wrote:
> > > > > >
> > > > > > On Sat, 17 Apr 2021 at 22:57, Ralph Goers <
> > > ralph.go...@dslextreme.com <mailto:ralph.go...@dslextreme.com>> wrote:
> > > > > >>
> > > > > >>
> > > > > >> When I release Log4j I rum mvn site followed by "mvn site:stage
> > > -DstagingDirectory=$HOME/log4j” on my laptop. I validate the site locally
> > > and then zip the site, cd to my logging-log4j-site project and unzip it
> > > where I want it to go.
> > > > > >
> > > > > > In the Wiki that process is described as follows:
> > > > > >
> > > > > > "3. Add the new site under the content directory (or a subdirectory
> > > of
> > > > > > that as appropriate)."
> > > > > >
> > > > > > This leaves out all the detail, making it seem simpler than it is.
> > > > > >
> > > > > > We don't have to do that zip dance currently, because the
> > > > > > site-content/ directory is checked out in the workspace.
> > > > > > So the site can be built directly into the target.
> > > > >
> > > > > Yes, a bit more explanation certainly would be helpful. I didn’t
> > > understand it either when I read it until I looked at the .asf.yaml files
> > > in the subproject.
> > > > >
> > > > > Yes, if you want to build the site directly into the target that
> > > shouldn’t be a problem.
> > > >
> > > > Maybe, but Git is less flexible when it comes to partial checkouts.
> > > >
> > > > > Hopefully the information I’ve provided about how the git-based site
> > > support with .asf.yaml files will be helpful.
> > > >
> > > > I'm not sure it will simplify matters for Commons, given the number of
> > > > components that it has.
> > > > Do we really want to set up -site repos for 50+ components?
> > >
> > > How about
> > >  * 1 repository for "proper"
> > >  * 1 repository for "sandbox"
> > >  * 1 repository for "dormant"
> > > ?
> > >
> > > > Also, the dormant and snapshot components are still in SVN, so we need
> > > > to allow for that.
> > >
> > > What do you mean by "snapshot component"?
> > >
> > > >
> > > > > I had to spend quite a bit of time figuring all this out on my own as
> > > the documents I linked to are even less clear than the Logging confluence
> > > page.
> > > >
> > > > .asf.yaml is quite neat, but there are a lot of possibilities for
> > > > confusion and error, especially if we end up with many more repos.
> > > >
> > > > > Ralph
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-18 Thread Gilles Sadowski
Le dim. 18 avr. 2021 à 12:51, sebb  a écrit :
>
> On Sun, 18 Apr 2021 at 00:03, Ralph Goers  wrote:
> >
> >
> >
> > > On Apr 17, 2021, at 3:32 PM, sebb  wrote:
> > >
> > > On Sat, 17 Apr 2021 at 22:57, Ralph Goers  > > > wrote:
> > >>
> > >>
> > >> When I release Log4j I rum mvn site followed by "mvn site:stage 
> > >> -DstagingDirectory=$HOME/log4j” on my laptop. I validate the site 
> > >> locally and then zip the site, cd to my logging-log4j-site project and 
> > >> unzip it where I want it to go.
> > >
> > > In the Wiki that process is described as follows:
> > >
> > > "3. Add the new site under the content directory (or a subdirectory of
> > > that as appropriate)."
> > >
> > > This leaves out all the detail, making it seem simpler than it is.
> > >
> > > We don't have to do that zip dance currently, because the
> > > site-content/ directory is checked out in the workspace.
> > > So the site can be built directly into the target.
> >
> > Yes, a bit more explanation certainly would be helpful. I didn’t understand 
> > it either when I read it until I looked at the .asf.yaml files in the 
> > subproject.
> >
> > Yes, if you want to build the site directly into the target that shouldn’t 
> > be a problem.
>
> Maybe, but Git is less flexible when it comes to partial checkouts.
>
> > Hopefully the information I’ve provided about how the git-based site 
> > support with .asf.yaml files will be helpful.
>
> I'm not sure it will simplify matters for Commons, given the number of
> components that it has.
> Do we really want to set up -site repos for 50+ components?

How about
 * 1 repository for "proper"
 * 1 repository for "sandbox"
 * 1 repository for "dormant"
?

> Also, the dormant and snapshot components are still in SVN, so we need
> to allow for that.

What do you mean by "snapshot component"?

>
> > I had to spend quite a bit of time figuring all this out on my own as the 
> > documents I linked to are even less clear than the Logging confluence page.
>
> .asf.yaml is quite neat, but there are a lot of possibilities for
> confusion and error, especially if we end up with many more repos.
>
> > Ralph
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-16 Thread Gilles Sadowski
Hello.

Le ven. 16 avr. 2021 à 20:39, Ralph Goers  a écrit :
>
> FYI - I did the work of moving Logging Services site from the CMS to git. It 
> really wasn’t that hard. The main web site is at 
> https://github.com/apache/logging-site 
> <https://github.com/apache/logging-site>.

So (IIUC this time), we can get things going by requesting/creating a new
"git" repository that would be called "commons-site"?

>  Each of the subproject has their own site such as 
> https://github.com/apache/logging-log4j-site 
> <https://github.com/apache/logging-log4j-site>.

Is this an independent "git" repository?
Do we also create those as would be a normal repository?

I see that the log4j "components" are under
   https://github.com/apache/logging-site/tree/master/docs/projects

And there is only one file (".asf.yaml") in
   https://github.com/apache/logging-log4j-site

> Although the Logging Services site is small the Log4j site is very large. I 
> can tell you that publishing the web site for each new releases is order of 
> magnitudes faster than SVN was. I did have to modify how the logging services 
> site gets built but all the subprojects use the Maven site plugin.

As noted previously, we seem to use that too in (all?) Commons
components
  $ mvn site

But, how does one go from the web files created in the
target/site/staging
directory, to them being moved (?) to the site repository?[1]

Regards,
Gilles

[1] The "Manage the Git Hosted Web Site" link on
 https://github.com/apache/logging-site
points to
 https://cwiki.apache.org/confluence

>
> Ralph
>
>
>
> > On Apr 16, 2021, at 5:27 AM, sebb  wrote:
> >
> > On Thu, 15 Apr 2021 at 14:41, Gilles Sadowski  wrote:
> >>
> >> Hello.
> >>
> >> [Sorry for jumping into the discussion while missing the meaning of
> >> most of what is being said (and cutting it).]
> >
> > In future please start a new thread in such cases.
> >
> >>> [...]
> >>>> So why cause additional work for projects that no longer use the CMS?
> >>>
> >>> I repeat, projects hopped on to the SVN area of the CMS , that is 
> >>> unsupported
> >>> and should not have been allowed to happen, it was a workaround by 
> >>> projects
> >>> undocumented to support mainly javadocs etc from what I gather.
> >>>
> >>> You caused the additional work yourselves in the beginning by not fully 
> >>> removing
> >>> from the CMS and all its infrastructure. Infra wants to clear out that 
> >>> area as part
> >>> of migrating away and provides a new space.
> >>
> >> From what I recollect, each of the "Commons" projects (component) has its
> >> own "trunk" area that is now a "git" repository.
> >> "trunk" contains a sub-directory under SVN named "site-content".[1]
> >> For quite some time now, the only thing I'm doing with this directory is 
> >> along
> >> the following:
> >> $ mvn site site:stage
> >> $ cd site-content
> >> $ rm -rf *
> >> $ cp -r ../target/staging/* .
> >> ["svn add" for added files, "svn del" for removed files...]
> >> $ svn commit
> >> And the web site for that component was updated.
> >>
> >> Is "site-content" being replaced by another location?
> >> Is the consequence that in each component we'll have to
> >> $ svn co https://new_location_of_site_content site-content
> >> ?
> >
> > Yes, that is what Infra want people to do.
> > Effectively to rename
> >
> > https://svn.apache.org/repos/infra/websites/production/commons/content/proper/commons-math
> > as
> > https://svn.apache.org/repos/infra/sites/commons/content/proper/commons-math
> >
> >> Could we perhaps take this opportunity to do away with SVN
> >> and "site-content" and have some "mvn" target directly populate
> >> the web site?
> >
> > That would be a good idea, but will likely take more than 30 days to
> > design and test.
> >
> > The Commons website consists of lots of different parts which are
> > separately built.
> >
> > The overall website is served from
> >
> > https://svn.apache.org/repos/infra/websites/production/commons/content/
> >
> > The component sites are committed to the appropriate subtree, so when
> > the whole is checked out it all fits together.
> >
> >> Regards,
&

Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-16 Thread Gilles Sadowski
Hi.

Le ven. 16 avr. 2021 à 15:15, sebb  a écrit :
>
> On Fri, 16 Apr 2021 at 13:40, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le ven. 16 avr. 2021 à 14:28, sebb  a écrit :
> > >
> > > On Thu, 15 Apr 2021 at 14:41, Gilles Sadowski  
> > > wrote:
> > > >
> > > > Hello.
> > > >
> > > > [Sorry for jumping into the discussion while missing the meaning of
> > > > most of what is being said (and cutting it).]
> > >
> > > In future please start a new thread in such cases.
> > >
> > > > > [...]
> > > > > > So why cause additional work for projects that no longer use the 
> > > > > > CMS?
> > > > >
> > > > > I repeat, projects hopped on to the SVN area of the CMS , that is 
> > > > > unsupported
> > > > > and should not have been allowed to happen, it was a workaround by 
> > > > > projects
> > > > > undocumented to support mainly javadocs etc from what I gather.
> > > > >
> > > > > You caused the additional work yourselves in the beginning by not 
> > > > > fully removing
> > > > > from the CMS and all its infrastructure. Infra wants to clear out 
> > > > > that area as part
> > > > > of migrating away and provides a new space.
> > > >
> > > > From what I recollect, each of the "Commons" projects (component) has 
> > > > its
> > > > own "trunk" area that is now a "git" repository.
> > > > "trunk" contains a sub-directory under SVN named "site-content".[1]
> > > > For quite some time now, the only thing I'm doing with this directory 
> > > > is along
> > > > the following:
> > > >  $ mvn site site:stage
> > > >  $ cd site-content
> > > >  $ rm -rf *
> > > >  $ cp -r ../target/staging/* .
> > > > ["svn add" for added files, "svn del" for removed files...]
> > > >  $ svn commit
> > > > And the web site for that component was updated.
> > > >
> > > > Is "site-content" being replaced by another location?
> > > > Is the consequence that in each component we'll have to
> > > >  $ svn co https://new_location_of_site_content site-content
> > > > ?
> > >
> > > Yes, that is what Infra want people to do.
> > > Effectively to rename
> > >
> > > https://svn.apache.org/repos/infra/websites/production/commons/content/proper/commons-math
> > > as
> > > https://svn.apache.org/repos/infra/sites/commons/content/proper/commons-math
> > >
> > > > Could we perhaps take this opportunity to do away with SVN
> > > > and "site-content" and have some "mvn" target directly populate
> > > > the web site?
> > >
> > > That would be a good idea, but will likely take more than 30 days to
> > > design and test.
> > >
> > > The Commons website consists of lots of different parts which are
> > > separately built.
> > >
> > > The overall website is served from
> > >
> > > https://svn.apache.org/repos/infra/websites/production/commons/content/
> > >
> > > The component sites are committed to the appropriate subtree, so when
> > > the whole is checked out it all fits together.
> > >
> > > > Regards,
> > > > Gilles
> > > >
> > > > [1] This has always seemed like a kludge and has repeatedly
> > > > caused issues (some of which have been worked around in the
> > > > POM, IIRC).
> > >
> > > Yes, it is a bit of a kludge, but it was a reasonable solution at the 
> > > time.
> > >
> > > There are now more options, so it might be possible to improve things.
> > >
> > > But this needs some thought and planning to ensure everything fits
> > > together, and to ensure it's possible to transition without breaking
> > > the website for too long.
> > >
> > > Who is going to so the work?
> > >
> > > Can it be done and implemented in the 30 day time limit?
> > >
> >
> > IMHO, the questions are first targeted at INFRA (hence it was
> > appropriate to discuss in the original thread, if I may add);
>
> IMO it is not on-topic for the original thread.

Then I did not understand the original thread. Sorry.

> > they
> > could perhaps point to a (hopefully easy) solution that will please
> > everyone.
>
> If you want an answer from Infra you will have to raise it with them.

I won't do that since I obviously do not understand what was
being talked about.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Redesign of Commons website generation (was: CMS Deprecated. Removal of configs and move to new publishing area)

2021-04-16 Thread Gilles Sadowski
Hi.

Le ven. 16 avr. 2021 à 14:28, sebb  a écrit :
>
> On Thu, 15 Apr 2021 at 14:41, Gilles Sadowski  wrote:
> >
> > Hello.
> >
> > [Sorry for jumping into the discussion while missing the meaning of
> > most of what is being said (and cutting it).]
>
> In future please start a new thread in such cases.
>
> > > [...]
> > > > So why cause additional work for projects that no longer use the CMS?
> > >
> > > I repeat, projects hopped on to the SVN area of the CMS , that is 
> > > unsupported
> > > and should not have been allowed to happen, it was a workaround by 
> > > projects
> > > undocumented to support mainly javadocs etc from what I gather.
> > >
> > > You caused the additional work yourselves in the beginning by not fully 
> > > removing
> > > from the CMS and all its infrastructure. Infra wants to clear out that 
> > > area as part
> > > of migrating away and provides a new space.
> >
> > From what I recollect, each of the "Commons" projects (component) has its
> > own "trunk" area that is now a "git" repository.
> > "trunk" contains a sub-directory under SVN named "site-content".[1]
> > For quite some time now, the only thing I'm doing with this directory is 
> > along
> > the following:
> >  $ mvn site site:stage
> >  $ cd site-content
> >  $ rm -rf *
> >  $ cp -r ../target/staging/* .
> > ["svn add" for added files, "svn del" for removed files...]
> >  $ svn commit
> > And the web site for that component was updated.
> >
> > Is "site-content" being replaced by another location?
> > Is the consequence that in each component we'll have to
> >  $ svn co https://new_location_of_site_content site-content
> > ?
>
> Yes, that is what Infra want people to do.
> Effectively to rename
>
> https://svn.apache.org/repos/infra/websites/production/commons/content/proper/commons-math
> as
> https://svn.apache.org/repos/infra/sites/commons/content/proper/commons-math
>
> > Could we perhaps take this opportunity to do away with SVN
> > and "site-content" and have some "mvn" target directly populate
> > the web site?
>
> That would be a good idea, but will likely take more than 30 days to
> design and test.
>
> The Commons website consists of lots of different parts which are
> separately built.
>
> The overall website is served from
>
> https://svn.apache.org/repos/infra/websites/production/commons/content/
>
> The component sites are committed to the appropriate subtree, so when
> the whole is checked out it all fits together.
>
> > Regards,
> > Gilles
> >
> > [1] This has always seemed like a kludge and has repeatedly
> > caused issues (some of which have been worked around in the
> > POM, IIRC).
>
> Yes, it is a bit of a kludge, but it was a reasonable solution at the time.
>
> There are now more options, so it might be possible to improve things.
>
> But this needs some thought and planning to ensure everything fits
> together, and to ensure it's possible to transition without breaking
> the website for too long.
>
> Who is going to so the work?
>
> Can it be done and implemented in the 30 day time limit?
>

IMHO, the questions are first targeted at INFRA (hence it was
appropriate to discuss in the original thread, if I may add); they
could perhaps point to a (hopefully easy) solution that will please
everyone.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [jira] [Closed] (STATISTICS-30) Gain Marks in Oracle Certified - 1Z0-997-20 Dumps PDF [2021] - Prepare4test

2021-04-16 Thread Gilles Sadowski
Hello Alex.

Le ven. 16 avr. 2021 à 13:27, Alex Herbert (Jira)  a écrit :
>
>
>  [ 
> https://issues.apache.org/jira/browse/STATISTICS-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Alex Herbert closed STATISTICS-30.
> --
> Resolution: Invalid

I believe those should be "deleted".
[The reason is probably to avoid that they be taken into account
for some report (e.g. number of issues opened/closed).]

Regards,
Gilles

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: CMS Deprecated. Removal of configs and move to new publishing area

2021-04-15 Thread Gilles Sadowski
Hello.

[Sorry for jumping into the discussion while missing the meaning of
most of what is being said (and cutting it).]

> [...]
> > So why cause additional work for projects that no longer use the CMS?
>
> I repeat, projects hopped on to the SVN area of the CMS , that is unsupported
> and should not have been allowed to happen, it was a workaround by projects
> undocumented to support mainly javadocs etc from what I gather.
>
> You caused the additional work yourselves in the beginning by not fully 
> removing
> from the CMS and all its infrastructure. Infra wants to clear out that area 
> as part
> of migrating away and provides a new space.

>From what I recollect, each of the "Commons" projects (component) has its
own "trunk" area that is now a "git" repository.
"trunk" contains a sub-directory under SVN named "site-content".[1]
For quite some time now, the only thing I'm doing with this directory is along
the following:
 $ mvn site site:stage
 $ cd site-content
 $ rm -rf *
 $ cp -r ../target/staging/* .
["svn add" for added files, "svn del" for removed files...]
 $ svn commit
And the web site for that component was updated.

Is "site-content" being replaced by another location?
Is the consequence that in each component we'll have to
 $ svn co https://new_location_of_site_content site-content
?
Could we perhaps take this opportunity to do away with SVN
and "site-content" and have some "mvn" target directly populate
the web site?

Regards,
Gilles

[1] This has always seemed like a kludge and has repeatedly
caused issues (some of which have been worked around in the
POM, IIRC).

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Gilles Sadowski
Le mar. 13 avr. 2021 à 18:21, Avijit Basak  a écrit :
>
> Hi
>
>   Please find my comments below.
>
> >> I don't follow the distinction "prod" vs "non-prod".
>  -- Actually in Prod we really need a very high performing system. So
> use of implicit parallelism in spark would help us to achieve it. But for
> other types of work like POC or R we may not need such performance.

Isn't a GA inherently parallel?
If so, why not take advantage of the concurrency tools provided by the JDK?

> >> the question was actually whether you are willing to modularize CM
>  -- I am not much aware of other ml components in commons. I would look
> into it.

I've mentioned them in earlier messages:
 * Self-organizing feature map (artificial neural net)
 * Clustering

The former is multi-threaded; the latter should be refactored to
take advantage of multi-threading.

> >>You did not expand about the usability/performance (e.g. the issue of
> multi-threading)
>  -- Are we planning to incorporate parallel GA.

Aren't you?

> Then multi-threading
> would be a more appropriate option.

IMHO, a necessary one.

> >> So, as a way forward, I would suggest that you create a project on
> GitHub (copying all the settings from a *Commons modular* component, such as
> "Commons Numbers")
>  -- Could you kindly share the GitHub repository URL for any Commons
> modular component.

https://github.com/apache/commons-rng
https://github.com/apache/commons-numbers
https://github.com/apache/commons-geometry
https://github.com/apache/commons-statistics

>
> Thanks & Regards
> --Avijit Basak
>
>
> On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a
> > écrit :
> > >
> > > Hi
> > >
> > >  Sorry for the delayed response. Thanks for your patience. Please
> > > find my comments below:
> > >
> > >  (1) Why not Spark?  [At least post over there (?).]
> > >   --We can move to Spark. But it will be very much useful if the
> > things
> > > can also run without Spark. The use of Spark would make more sense in a
> > > production environment. But the portability of the library will be more
> > > useful for the non-prod environment.
> >
> > I don't follow the distinction "prod" vs "non-prod".
> >
> > > Definitely, we can reach the Spark
> > > team and query.
> >
> > That would be a good idea...
> >
> > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > >--I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > Sure, but nobody is currently working on (2).
> >
> > >  (3) Modularize CM? [Who will do it?]
> > >--I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > I don't doubt it; but the question was actually whether you are willing
> > to modularize CM (that is: in addition to, and before, contributing to
> > the GA functionality).
> >
> > >  (4) New component (with another name) with the proposed contents?
> > >--This is the best option if permitted.
> >
> > Currently, only the two of us are in favour of this alternative.
> >
> > Nobody, by their action, is really in favour of any of the other
> > alternatives.
> > So, as a way forward, I would suggest that you create a project on GitHub
> > (copying all the settings from a Commons modular component, such as
> > "Commons Numbers"), to be eventually integrated here, once its potential
> > has been demonstrated.
> >
> > >   The code which I have written can be reused with minor
> > modifications.
> > > So it won't take too much effort for this activity.
> >
> > You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)...
> >
> > Regards,
> > Gilles
> >
> > >> [...]
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Gilles Sadowski
Hello.

Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a écrit :
>
> Hi
>
>  Sorry for the delayed response. Thanks for your patience. Please
> find my comments below:
>
>  (1) Why not Spark?  [At least post over there (?).]
>   --We can move to Spark. But it will be very much useful if the things
> can also run without Spark. The use of Spark would make more sense in a
> production environment. But the portability of the library will be more
> useful for the non-prod environment.

I don't follow the distinction "prod" vs "non-prod".

> Definitely, we can reach the Spark
> team and query.

That would be a good idea...

>  (2) Further develop a monolithic CM?  [Who will do it?]
>--I can help with the upgrade of the existing library related to GA
> functionality.

Sure, but nobody is currently working on (2).

>  (3) Modularize CM? [Who will do it?]
>--I can help with the upgrade of the existing library related to GA
> functionality.

I don't doubt it; but the question was actually whether you are willing
to modularize CM (that is: in addition to, and before, contributing to
the GA functionality).

>  (4) New component (with another name) with the proposed contents?
>--This is the best option if permitted.

Currently, only the two of us are in favour of this alternative.

Nobody, by their action, is really in favour of any of the other alternatives.
So, as a way forward, I would suggest that you create a project on GitHub
(copying all the settings from a Commons modular component, such as
"Commons Numbers"), to be eventually integrated here, once its potential
has been demonstrated.

>   The code which I have written can be reused with minor modifications.
> So it won't take too much effort for this activity.

You did not expand about the usability/performance (e.g. the issue of
multi-threading)...

Regards,
Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: GitHub license display confused by LICENSE-header.txt

2021-03-09 Thread Gilles Sadowski
Le mar. 9 mars 2021 à 11:58, sebb  a écrit :
>
> On Tue, 9 Mar 2021 at 01:39, Gilles Sadowski  wrote:
> >
> > Le mar. 9 mars 2021 à 01:41, sebb  a écrit :
> > >
> > > Most of the Commons projects show up in GitHub as having the Apache 2.0 
> > > License
> > >
> > > However a few show up as 'other':
> > >
> > > commons-codec
> > > commons-csv
> > > commons-dbutils
> > > commons-exec
> > > commons-jelly
> > > commons-logging
> > > commons-math
> > > commons-rdf
> > > commons-text
> > > commons-weaver
> > >
> > > AFAICT this is because there is more than 1 LICENSE file at the top level:
> > > I forked codec and deleted the LICENSE-header.txt file, and the
> > > license then showed up as AL 2.0.
> >
> > For commons-math, there is a single license file; so the reason
> > must be different.  Perhaps the contents (several sections were
> > added below the license text)?
> > Should those be moved to the "NOTICE" file?
>
> I don't thnk that is allowed.
> NOTICE is for required attributions only.

So, after all, do you mean that
  1. GitHub is just wrong in not detecting that CM is AL2.0, and
  2. there is nothing to change for CM
?

 Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: GitHub license display confused by LICENSE-header.txt

2021-03-08 Thread Gilles Sadowski
Le mar. 9 mars 2021 à 01:41, sebb  a écrit :
>
> Most of the Commons projects show up in GitHub as having the Apache 2.0 
> License
>
> However a few show up as 'other':
>
> commons-codec
> commons-csv
> commons-dbutils
> commons-exec
> commons-jelly
> commons-logging
> commons-math
> commons-rdf
> commons-text
> commons-weaver
>
> AFAICT this is because there is more than 1 LICENSE file at the top level:
> I forked codec and deleted the LICENSE-header.txt file, and the
> license then showed up as AL 2.0.

For commons-math, there is a single license file; so the reason
must be different.  Perhaps the contents (several sections were
added below the license text)?
Should those be moved to the "NOTICE" file?

Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [crypto] Interest in adding support for cryptographic hash function?

2021-02-27 Thread Gilles Sadowski
Le sam. 27 févr. 2021 à 19:00, Bernd Eckenfels
 a écrit :
>
> Hello,
>
> I don’t think it’s a Good idea to introduce native dependencies to formerly 
> pure Java projects.

+1
[I thought that the idea was a (pure) Java implementation.]

> So i think native optimized hash implementations would fit better in 
> commons-crypto. So I would say go for it, keep in mind license clearance and 
> portability.
>
> Gruß
> Bernd
>
> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [crypto] Interest in adding support for cryptographic hash function?

2021-02-27 Thread Gilles Sadowski
Hi.

Le sam. 27 févr. 2021 à 15:51, Alex Remily  a écrit :
>
> I'm working on a project that makes heavy use of hashing, and I'd like to
> use the OpenSSL implementation.  Thoughts on adding support for the SHA-2

http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/Sha2Crypt.html

> and/or Blake2 family into commons crypto?

[Codec] seems to be the appropriate component.

> I'm happy to do the work

Thanks!

> if
> there's someone out there willing to review and advise.
>
> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [STATISTICS] Truncated Normal Distribution

2021-02-19 Thread Gilles Sadowski
Hello.

Le ven. 19 févr. 2021 à 08:43, Marko Malenic  a écrit :
>
> Hi Gilles,
>
> Thanks, I've sent that through, and opened a pull request on the
> commons-statistics github page.

Thanks!

Nit-pick: When you've implemented the changes suggested by Alex,
please mention the JIRA identifier in the commit message, i.e.:
---
STATISTICS-27: Implement truncated normal distribution.
---

Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [STATISTICS] Truncated Normal Distribution

2021-02-18 Thread Gilles Sadowski
Hi.

Le ven. 19 févr. 2021 à 00:59, Marko Malenic  a écrit :
>
> Hi Gilles,
>
> Do I need to sign a contributor license agreement for this?

Yes, it is preferable.[1]
Thank you for asking.

Regards,
Gilles

[1] http://www.apache.org/licenses/contributor-agreements.html

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [STATISTICS] Truncated Normal Distribution

2021-02-17 Thread Gilles Sadowski
Hello.

Le mer. 17 févr. 2021 à 04:25, Marko Malenic  a écrit :
>
> Hi all,
>
> I've completed an implementation of a truncated normal distribution for
> commons-statistics,
> as described by:
> https://issues.apache.org/jira/browse/STATISTICS-27
> at:
> https://github.com/mmalenic/commons-statistics/tree/STATISTICS-27-TruncatedNormalDistribution
>
> When submitting a PR what is the preferred format for commits?
> Should I squash all my commits into one,

Yes.

Thanks,
Gilles

> or leave my individual commits
> (some of which do not pass build checks)?
>
> Best,
> Marko

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-02-14 Thread Gilles Sadowski
Le dim. 14 févr. 2021 à 09:06, Avijit Basak  a écrit :
>
> Hi
>
>I would like to mention a few points here. Genetic Algorithm has a
> vast range of applications in optimization and search problems. Machine
> learning is only one of those.
>If we couple the new GA library with any specific domain like ml it
> would be meaningless for people working in other domains.

Isn't "meaningless" a slight overstatement?
We might have an issue of terminology: There is no necessary "coupling"
but maybe "acquaintance" (for lack of a better word), as a set of tools that
might come in handy for solving certain types of problems.  [For example,
the Traveling Salesman Problem can be tackled by GA and SOFM, both
of which are candidate for inclusion in the new component, although they
don't share any code.]

If the name "machine learning" is not the most appropriate one to convey
the intended scope, do you have another idea?
["AI" would perhaps be more correct if we consider a strict hierarchy, but
would obviously be far too presumptuous.]

> They have to
> incorporate the entire ml library

No, they won't.  Given the stated goal of "modularity": the "ga" module
will be available as a dedicated JAR (possibly with a dependency to
codes that can be reused in other modules provided by the component).

> which may be completely unrelated to
> their project. Coupling it with any technology like spark might also limit
> it's usability.

You may be right; I have no idea about the "restrictions" imposed by
Spark.  [It seems that in this case, one would have to indeed depend
on Spark's "mllib" (?).  This would be one reason, as I already stated,
for having something in "Commons".]

Could you elaborate on a concrete use-case where one would be
starting to develop an application with the specific requirement that
Spark could not be used?
In particular, IIRC Spark has multi-threading built in.  Don't you see
it as a huge problem that CM would not provide such a feature?

>If a separate component is not approved for this change then we can
> incorporate the changes as part of *commons.math* library.

Of course, if somebody wants to do that, he's welcome.
[That will not be me, for all the reasons which I've explained.  In the last
5 years I've been pretty much alone in handling bug reports about CM;
I'm unwilling to assume implicit support for even more codes.]

Also, with this solution, you'd now be willing to accept what you weren't
above: Anyone wanting to use the GA functionality would indeed have to
"incorporate" the whole of "Commons Math" (CM).
Of course, the latter could be modularized, but this will only mitigate the
issue, as any release of the GA functionality will potentially be then held
off by potential issues in other parts of CM (which nobody has been able
to consistently support for more than 5 years now).

>The same library can be reused in ml or neural network libraries as
> a dependency.

It is the other way around:  The development version of CM currently
depends on "lower-level" components.
Furthermore, right now its (embryonic) "machine learning" functionality
hasn't any substantial dependency on codes outside the "o.a.c.math4.ml"
package.

>Kindly share further views on this.

In summary, to be clarified:
 (1) Why not Spark?  [At least post over there (?).]
 (2) Further develop a monolithic CM?  [Who will do it?]
 (3) Modularize CM? [Who will do it?]
 (4) New component (with another name) with the proposed contents?

To make things clear from my side:  As a *user*, I've currently some
stake at having a clean, independent "ml" component or an independent
"sofm" module.  So I could do (4).  Or help with (3), on the condition that
*other* people get things moving.

Regards,
Gilles

>
> Thanks & Regards
> --Avijit Basak
>
> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski  wrote:
>
> > Le mer. 10 févr. 2021 à 13:19, sebb  a écrit :
> > >
> > > Likewise, commons-ml is too cryptic.
> > >
> > > Also, the Spark project has a machine-learning library:
> > >
> > > https://spark.apache.org/mllib/
> >
> > Thanks for the pointer.
> >
> > >
> > > Maybe that would be better home?
> >
> > On the face of it, probably.
> > [For sure, Avijit should comment on the suggestion.]
> >
> > On the other hand, "Commons" is the place where one can pick "bare
> > bone" implementations, and add the functionality to one's application
> > without necessarily comply with an overarching framework.
> > [I don't mean that framework compliance is bad; quite the contrary, it is
> > hopefully the result of a

Re: [Vote] Create a "machine learning" component

2021-02-10 Thread Gilles Sadowski
Le mer. 10 févr. 2021 à 13:19, sebb  a écrit :
>
> Likewise, commons-ml is too cryptic.
>
> Also, the Spark project has a machine-learning library:
>
> https://spark.apache.org/mllib/

Thanks for the pointer.

>
> Maybe that would be better home?

On the face of it, probably.
[For sure, Avijit should comment on the suggestion.]

On the other hand, "Commons" is the place where one can pick "bare
bone" implementations, and add the functionality to one's application
without necessarily comply with an overarching framework.
[I don't mean that framework compliance is bad; quite the contrary, it is
hopefully the result of a thorough reflection by experts.  But ... cf. the
numerous "no-dependency" discussions ...]

Actually, concerning Avijit's proposed contribution, didn't I say:[1]
---CUT---
Thus, I think that we must assess whether the "genetic algorithms"
functionality has a reasonable future within "Apache Commons" (i.e.
potential users and contributors) while there exist other libraries that
seem much more advanced for any serious usage.
---CUT---

> I'm also a bit concerned as to whether there are sufficient developers
> here with knowledge of the ML domain to be able to support the code in
> the future.

An interesting point; by all means not a new one (see e.g. [2]).

Isn't it the same point I've been making about "Commons Math" (CM)?
There has been no releases because nobody here is able (or is willing
to) support it.

Concerning the support of the purported "machinelearning" component:
1. Package
org.apache.commons.math4.ml.neuralnet
* I've written it entirely and I have applications that depend on it (and I
  cannot assume that I could easily switch to, or port it to, Spark), so I
  can reasonably ensure that it would be supported.
2. Package
org.apache.commons.math4.ml.clustering
* Functionality is mentioned in Spark's "mllib" user guide.
* When a new feature was last contributed[3], it was noticed[4][5][6]
  that improvement were needed (but there was no follow-up).
* I've an application that depend on it (from CM v3.6.1) but I wouldn't
  support it if shipped in CM v4.0.
3. Package
org.apache.commons.math4.genetics
* Part of my "end-of-study" project consisted in a GA implementation.
  I've never used the CM implementation, and I don't deny that there
  could be perfectly fine uses of it but, just looking at the code, it seems
  obvious that it cannot compete feature-wise with other libraries
out there.
* I've suggested long ago that, without anyone supporting it actively (and
  no known user community), it should be dropped from CM.
* Avijit expressed a willingness to improve the functionality:  Is
this enough
  for the PMC to create a new component?  From the experience with the
  "clustering" package mentioned above, I'd tend to think (unfortunately)
  that it isn't.  He should first explore whether the Spark community is
  interested, that the GA functionality be moved over there.

Gilles

[1] https://issues.apache.org/jira/browse/MATH-1563
[2] https://markmail.org/message/26yxj5vhysdsoety
[3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
[4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
[5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
[6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526

>
> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg  wrote:
> >
> > -1 for commons-ml for the same reasons.
> >
> > What about commons-machine-learning or commons-math-learning? The latter
> > is as long as commons-configuration.
> >
> > Emmanuel Bourg
> >
> >
> > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > -1 on commons-ml as the name. My first thought is such a repo would
> > > hold stuff related to mailing lists. Then again maybe it contains
> > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > the ML Programming Language [1].
> > >
> > > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > > be +0 since it is still not obvious what it would contain.
> > >
> > > Ralph

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-02-10 Thread Gilles Sadowski
Le mer. 10 févr. 2021 à 09:27, Emmanuel Bourg  a écrit :
>
> -1 for commons-ml for the same reasons.
>
> What about commons-machine-learning or commons-math-learning? The latter
> is as long as commons-configuration.

Java users should be used to lengthy names.
It should thus be "commons-machinelearning" as hyphens, by convention,
separate items that become sub-packages in the Java code.

>
> Emmanuel Bourg
>
>
> Le 2021-02-10 03:27, Ralph Goers a écrit :
> > -1 on commons-ml as the name. My first thought is such a repo would
> > hold stuff related to mailing lists. Then again maybe it contains
> > stuff relating to markup languages. Maybe it is Apache’s version of
> > the ML Programming Language [1].

Strange rationale.  As if someone would not read the full name of a
libary before deciding whether it provides what he needs...

> >
> > However, I wouldn’t be -1 on commons-math-ml, although at best I would
> > be +0 since it is still not obvious what it would contain.

As explained, this is not a useful or descriptive name: ML is not part
of what mathematicians would consider a part of mathematics.
ML is an area of computer science, inspired by biological processes.

Gilles

> >
> > Ralph

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[Vote] Create a "machine learning" component

2021-02-09 Thread Gilles Sadowski
Hi.

Because of an offered contribution, a discussion happened on
JIRA[1] and in another thread[2] about improving the genetic
algorithm (GA) implementation currently in the
   org.apache.commons.math4.genetic
package of the "Commons Math" component.
It would make sense to group "machine learning" algorithms[3]
(to which GA belongs) within a single component, where codes from
  org.apache.commons.math4.ml.neuralnet
  org.apache.commons.math4.ml.clustering
would be moved too.
This would be the fifth (and last) component resulting from my proposal
(see e.g. [4] among other threads) for the reorganization of the "Commons
Math"[5] code base into more maintainable components[6][7][8][9], each
focused on actually related functionalities (thus *not* the wide expertise
necessary for the maintenance of a full-fledged math library).

I suggest "ML" for the name of the component.

Regards,
Gilles

[1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1563
[2] https://markmail.org/message/dnujdcxuaq5bwuwe
[3] https://en.wikipedia.org/wiki/Machine_learning
[4] https://markmail.org/message/75vuyhzblfadc5op
[5] http://commons.apache.org/proper/commons-math/
[6] http://commons.apache.org/proper/commons-rng/
[7] http://commons.apache.org/proper/commons-numbers/
[8] http://commons.apache.org/proper/commons-geometry/
[9] http://commons.apache.org/proper/commons-statistics/

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [lang] Introduce @NonNull, and @Nullable

2021-02-01 Thread Gilles Sadowski
Le lun. 1 févr. 2021 à 14:49, Jochen Wiedmann
 a écrit :
>
> On Mon, Feb 1, 2021 at 2:34 PM Gary Gregory  wrote:
>
> > 1) Don't add and use custom annotations, this opens the door to each of our
> > 20+ components doing the same thing, so pick a library and stick with it.
>
> I suggested starting with lang, because that would be the template for
> most others. (At least those, who are already using lang as a
> dependency.)
>
> > 2) Make sure there are no licensing issues with that library.
>
> What licensing issues could there be with a dependency in scope
> "provided"? It is never going to be distributed,isn't it?
> Besides, the suggested library is available under ASL2:
> https://search.maven.org/artifact/com.google.code.findbugs/jsr305/3.0.2/jar

Potential issues are mentioned there:
https://dzone.com/articles/when-to-use-jsr-305-for-nullability-in-java

This[1] may be especially worrying (?):
---CUT---
Using jsr305 causes additional issues, if Guava is used in a modular
JDK9 applications, because it puts the annotations into
javax.annotation package, which is also used by a couple of other
JAR-s and a legacy JDK module java.xml.ws.annotation. If one wants to
create a modular JDK9 application with two dependencies to conflicting
JAR-s, Java refuses to compile and run it because of a package split.
[...]
Findbugs has been rebooted as Spotbugs and they are going to make a
switch from JSR-305 to their own internal annotations in version 4.0.0
that do not break anything [...]
---CUT---

Regards,
Gilles

[1] https://github.com/google/guava/issues/2960#issue-263628666

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [lang] Introduce @NonNull, and @Nullable

2021-02-01 Thread Gilles Sadowski
Le lun. 1 févr. 2021 à 14:34, Gary Gregory  a écrit :
>
> My concerns are:
>
> 1) Don't add and use custom annotations, this opens the door to each of our
> 20+ components doing the same thing, so pick a library and stick with it.

+1

> 2) Make sure there are no licensing issues with that library.

Or create a "Commons" component for that specific purpose (?).

Gilles

> > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[Vote] New "git" repository (Was: [All][Math] New GA component)

2021-01-29 Thread Gilles Sadowski
Hello.

Assuming lazy consensus[1], please state any objection to creating a
new repository that will host the GA-related functionality (spin-off from
"Commons Math").

More context on JIRA.[2]
I'll take care of setting up the new component (a.o. moving the relevant
codes over to the new repository), after which Avijit can start improving
it.

Thanks,
Gilles

[1] https://www.apache.org/foundation/glossary.html#LazyConsensus
[2] https://issues.apache.org/jira/browse/MATH-1563

Le ven. 29 janv. 2021 à 13:20, Avijit Basak  a écrit :
>
> Hello Gilles
>
>  Thanks for your reply. Actually I am not very comfortable with the 
> porting process. It will be really nice if I can have an initial repository.
>
> Thanks & Regards
> --Avijit Basak
>
>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [geometry] IO Modules

2021-01-27 Thread Gilles Sadowski
Hello.

Le lun. 25 janv. 2021 à 13:40, Matt Juntunen
 a écrit :
>
> Hello,
>
> I have two main goals for the IO modules here:
>
>   1.  Provide a simple, high-level API (i.e. IO3D) for reading and writing 
> geometry with a minimum of fuss.

Sure.  But this high-level API looks like "syntactic sugar" that can
certainly be
done in a number of ways (with more or less code and/or more or less
flexibility).

>   2.  Provide a low-level, extensible API specific to each data format that 
> can be used to access addition format-specific information while reading and 
> provide greater control over the output while writing.
>
> So, there are actually two different APIs in question here. Users could use 
> the high-level API when only the geometry itself is of interest and the 
> low-level API when additional metadata is required. Useful examples of this 
> metadata are the object and group names from the OBJ format (which can be 
> used to store separate geometries in a single file) and the facet attribute 
> bytes in binary STL files (which are sometimes used to store color 
> information or other values). This information does not map directly to any 
> data structures in commons-geometry but it is certainly useful to be able to 
> access it (I will want to do so in my day job, for instance).

In effect, how would one map (application) data that is tied to a
(library) facet
instance?

>
> > Such customization could also be handled at the application level through
> a (handler-specific) property file.
>
> I'd rather not deal with configuration files and keep things simple and 
> lightweight.

Maybe I was not clear because I don't see how this configuration file(s)
make it less lightweight from a casual user's POV.  A default configuration
would be provided (that associates default extensions with the library-provided
handlers).  User could easily append extensions and handlers, rather than
having to do it in code.  It also seems like a feature that one could disable a
handler (rather that having code always loaded for a format which the user
doesn't actually want to use).

> > Then the case for the "enum" is moot (IIUC).
>
> Yes, it might be. I would like to allow format names to be mapped to more 
> than one file extension, though.

I don't understand how "enum" and multiple extensions are related...
My remark was about the "enum" usage in order to enforce  a single API
for file format (but I understood that your use-case requires more flexibility).

>
> > User-code should be in charge of associating input (e.g. file name) with 
> > how to handle it (e.g. the instantiation of the read handler).
>
> This would be the case for the low-level API, but I want the high-level API 
> to be able to handle this itself, based on its configuration. I want to be 
> able to call 'IO3D.read(Paths.get("cube.obj"))'

This would just work (the devil being in the details) with the hypothetical
"handlers.conf" file containing:
---CUT---
formats=OBJ
OBJ.extensions=obj,OBJ
OBJ.reader=org.apache.commons.geometry.io.euclidean.threed.ObjFormatReader
---CUT--

Regards,
Gilles

> just as I might call 'ImageIO.read(new File("image.png"))'.
>
> Regards,
> Matt J
>
> 
> From: Gilles Sadowski 
> Sent: Saturday, January 23, 2021 9:40 AM
> To: Commons Developers List 
> Subject: Re: [geometry] IO Modules
>
> Hi.
>
> Le ven. 22 janv. 2021 à 03:38, Matt Juntunen
>  a écrit :
> >
> > Hi Gilles,
> >
> > > Really, the main point is to separate format (contents) from filename 
> > > (container).
> >
> > This makes sense. What would you think of the approach below?
>
> I have no strong objections, as I do not graps all the requirements.
> [Maybe, IO-related stuff is always bound to be messy (cf. "java.io" vs
> "java.nio").]
>
> > This would separate the format name from the file extension(s) and provide 
> > an enum containing default format information and handlers. Usage of the 
> > enum would be optional since there would still be overloads that accept a 
> > simple format name string.
>
> It reminds me of a discussion concerning "Bloom filters", about identifiers
> for a hash function that could user-defined.
> IIRC, one idea (proposed by Alex) was to maintain a text file of (unique)
> identifiers.
>
> > For the BoundaryIOManager methods that accept a Path or URL, the format 
> > would still be determined by the file extension.
>
> I'm uncomfortable with having that kind of assumption in a low-level library
> (bad reminiscence of M$-DOS days).  User-code should be in charge of
> associating input (e.g. file name) with ho

Re: NPE in CSVFormat

2021-01-27 Thread Gilles Sadowski
Le jeu. 28 janv. 2021 à 02:30, Gary Gregory  a écrit :
>
> You can't attach files to this mailing list.
>
> You'll have to figure out GitHub PRs so we can see what you're proposing to
> do ;-)

Or use JIRA...

But about the issue itself: Hasn't "Commons Logging" been superseded
by the "Log4j" project [1] ?

Gilles

[1] https://logging.apache.org/log4j/2.x/

> Gary
>
> On Wed, Jan 27, 2021, 15:40 Derek Bennett  wrote:
>
> > While writing integration tests for my software, I noticed that many calls
> > to commons logging were terminating with NullPointerException. I checked
> > out the code and added input validation to CSVFormat.printRecord. While
> > adding my unit test, I found that I broke another unit test. It appears
> > that someone had enshrined the NPE in a unit test as expected behavior. So,
> > part of my patch is the removal of that unit test. I tried creating a pull
> > request, but was unable to do so. So, I have attached the patch file here.
> > Please review it.
> > I have never created a pull request on GitHub before. So, if this is a
> > case where I should, please instruct me and I will be glad to.
> >
> > Thanks,
> > Derek Bennett

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] FastMath isn't fast...

2021-01-27 Thread Gilles Sadowski
Hi.

Le mer. 27 janv. 2021 à 17:39, Erik Svensson  a écrit :
>
> Hello all!
>
> I work for a fintech company and we do a lot of risk computations using, 
> among other things, FastMath.
> Recently I had the opportunity to do some performance testing using JMH and 
> found, to my surprise, that once
> you move beyond Java 8, java.lang.Math outperforms FastMath, sometimes quite 
> considerably. Graal11 especially is very performant.

Actually, this kind of issue has been noticed for a long time:
  https://issues.apache.org/jira/browse/MATH-1113
  https://issues.apache.org/jira/browse/MATH-901
  https://issues.apache.org/jira/browse/MATH-740

At the time, it seemed that the most consistent advantage of "FastMath"
was accuracy:
  https://issues.apache.org/jira/browse/MATH-1114
But this could also have evolved into "Math" having become more accurate.

>
> I’ve traced the cause to the introduction of @HotSpotIntrinsicCandidate in 
> java 9 that replaces the Java 8 JNI call.
>
> I could just use the Math package where we explicitly call the FastMath
> package but we use other commons.math stuff that depends on FastMath
> which means that they miss out on the possible performance gains.

The project includes micro-benchmarks[1] but they have been obsoleted by
the advent of JMH; so you are quite welcome to contribute your JMH tests.

>
> I’m wondering if there is any effort to handle this in FastMath as the gains 
> are quite considerable.

The next major version is supposed to require Java 8.
The first step would be to establish, for each function,
 * whether the "FastMath" version is slower, and if so
 * whether it is more accurate
than its "Math" equivalent.

> One solution would be to check for the annotation in the Math package, and if 
> it’s available, use the Math package instead.

See Matt's reply.

Thanks,
Gilles

[1] 
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=src/userguide/java/org/apache/commons/math4/userguide/FastMathTestPerformance.java;h=d140393d224b4ff6592224f48b59e8950963b49b;hb=HEAD

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [geometry] IO Modules

2021-01-23 Thread Gilles Sadowski
Hi.

Le ven. 22 janv. 2021 à 03:38, Matt Juntunen
 a écrit :
>
> Hi Gilles,
>
> > Really, the main point is to separate format (contents) from filename 
> > (container).
>
> This makes sense. What would you think of the approach below?

I have no strong objections, as I do not graps all the requirements.
[Maybe, IO-related stuff is always bound to be messy (cf. "java.io" vs
"java.nio").]

> This would separate the format name from the file extension(s) and provide an 
> enum containing default format information and handlers. Usage of the enum 
> would be optional since there would still be overloads that accept a simple 
> format name string.

It reminds me of a discussion concerning "Bloom filters", about identifiers
for a hash function that could user-defined.
IIRC, one idea (proposed by Alex) was to maintain a text file of (unique)
identifiers.

> For the BoundaryIOManager methods that accept a Path or URL, the format would 
> still be determined by the file extension.

I'm uncomfortable with having that kind of assumption in a low-level library
(bad reminiscence of M$-DOS days).  User-code should be in charge of
associating input (e.g. file name) with how to handle it (e.g. the instantiation
of the read handler).

> If users want to use a non-standard file extension, they can open the IO 
> stream themselves and use the read/write methods that accept an IO stream and 
> format string name or Format instance.

What is "standard"/"non-standard"?  You use "txt", but the most standard
meaning of this extension is that the contents is ASCII-encoded...
And "csv" is also not sufficient to convery that contents is actually much
more constrained than a comma-separated list of strings.

Couldn't a file be used to define which read/writer the library should
instantiate, and to which extension it could be associated?

>
> interface Format {
> String getName();
> List getFileExtensions();
> }
>
> class BoundaryIOManager {
> void register(BoundaryFormat fmt, BoundaryReadHandler rh, 
> BoundaryWriteHandler wh) {
> register(fmt.getName(), fmt.getFileExtensions(), rh, wh);
> }
> void register(String formatName, List extensions, 
> BoundaryReadHandler rh, BoundaryWriteHandler wh) {...}
>
> // ...
>
> void write(BoundarySource src, OutputStream out, Format fmt) {
> write(src, in, fmt.getName());
> }
> void write(BoundarySource src, OutputStream out, String formatName) 
> {...}
>
> // similar read methods ...
> }
>
> enum StandardFormat3D implements Format {
> OBJ(...),
> TXT(...),
> CSV(...);
>
> public String getName() {...}
> public List getFileExtensions() {...}
> public BoundaryReadHandler3D readHandler() { (execute a supplier 
> function)... }
> public BoundaryWriteHandler3D writeHandler() { (execute a supplier 
> function)... }
> }
>
> > The "enum" is for natively supported formats to allow for simple API, while 
> > "hiding" the actual implementations (as in "RandomSource" from "Commons 
> > RNG").
>
> I'd prefer to not hide the format-specific classes, at least not completely.

Then the case for the "enum" is moot (IIUC).

> For example, the OBJ file format can contain a lot more information than just 
> pure geometry, such as object names (more than one geometry can be contained 
> in a single file), material information (for use in rendering), free-form 
> curve definitions, etc. This information is not used to produce 
> BoundarySource3D or Mesh instances but it can be accessed easily by extending 
> AbstractOBJParser or PolygonOBJParser. Also, additional information such as 
> comments and object names can be included in output files if the OBJWriter 
> class is used directly, as opposed to IO3D or BoundaryIOManager3D. It seems 
> like a waste to completely hide this functionality.

I agree to not waste functionality.  But how is the additional contents
handled currently?  It seems that it simply discarded, and someone
wanting to retrieve it would then discard the current functionality that
only return a "BoundarySource3D".
Sorry if I'm missing something because of my not having read the code
but this makes me think that a parser generator would have allowed
for extending the support of a given format.

>
> Another reason to keep these classes public is that they may need to be 
> accessed in order to configure them. For example, the txt, csv, and obj 
> formats use a default format pattern for writing floating point numbers as 
> text. If this nee

Re: [geometry] IO Modules

2021-01-21 Thread Gilles Sadowski
Hello.

Le mer. 20 janv. 2021 à 23:55, Matt Juntunen
 a écrit :
>
> Hi Gilles,
>
> I've updated the PR with the new module/package names.
>
> > I don't see the link between "(not) extensible" and "enum": Extensibility 
> > is provided by API (which classes are public and meant to be reused, e.g. 
> > by custom IO code) while the "enum" defines the file formats that are 
> > "natively" supported by this library.
>
> I might not be picturing this correctly so perhaps a code example would help. 
> Here is what I'd like to be able to do with the IO code:
>
> // add custom handler for my own file format
> IO3D.getDefaultManager().registerReadHandler("matt", new 
> MattFileReadHandler());
>
> // read a file using that format
> BoundarySource3D result = IO3D.read(Paths.get("mygeometry.matt"));
>
> I don't see how the above could be accomplished if the supported formats are 
> in an enum.

The "enum" is for natively supported formats to allow for simple API, while
"hiding" the actual implementations (as in "RandomSource" from "Commons
RNG").
Really, the main point is to separate format (contents) from filename
(container).

// Library code.

/** Native file formats. */
enum FileFormat {
// "ObjFileReadHandler" and "ObjFileWriteHandler" must be "internal".
OBJ(new ObjFileReadHandler(), new ObjFileWriteHandler());

private final FileReadHandler rh;
private final FileReadHandler wh;

FileFormat(FileReadHandler rh,
  FileWriteHandle wh) {
this.rh = rh;
this.wh = wh;
}

// Methods to support "register" and "registerSuffix" (see below).
}

// Method does not exist yet.
IO3D.getDefaultManager().register(FileFormat.OBJ);

// In user code

// Choose format, and file name (arbitrary).
IO3D.write(src, filename, FileFormat.OBJ);

// Associate user-preferred suffix(es) to a natively supported file format...
IO3D.getDefaultManager().registerSuffix(FileFormat.OBJ, ".obj",
".OBJ"); // Method does not exist yet.
// .. and the library will add the default suffix (".obj").
IO3D.write(src, filenamePrefix, FileFormat.OBJ);

// User-defined handlers.
final String id = IO3D.getDefaultManager().register(new
MattFileReadHandler(), new MattFileWriteHandler()); // Method does not
exist yet.
IO3D.getDefaultManager().registerSuffix(id, "matt");

> Also, the file extension approach above is similar to that used by 
> javax.imageio.ImageIO.

IIUC, they also make the difference pointed out above:
https://docs.oracle.com/javase/7/docs/api/javax/imageio/ImageIO.html#getImageReadersByMIMEType(java.lang.String)
https://docs.oracle.com/javase/7/docs/api/javax/imageio/ImageIO.html#getImageReadersBySuffix(java.lang.String)

Sorry if I've cut some corners...

Regards,
Gilles

>
> Regards,
> Matt
>
> 
> From: Gilles Sadowski 
> Sent: Wednesday, January 20, 2021 5:28 PM
> To: Commons Developers List 
> Subject: Re: [geometry] IO Modules
>
> Le mer. 20 janv. 2021 à 22:42, Matt Juntunen
>  a écrit :
> >
> > Hi Gilles,
> >
> > > package "io" does not belong in "core".
> >
> > Ah, yes. I'd like to keep the core and euclidean IO modules separate since 
> > I'm picturing a spherical IO module later on down the road. In that case, 
> > perhaps we could just put the "io" portion of the module/package name 
> > before everything else? That would give us
> >
> > commons-geometry-io-core - contains package o.a.c.geometry.io.core
> > commons-geometry-io-euclidean - contains package 
> > o.a.c.geometry.io.euclidean
>
> Yes; having "io" as the top-level package is what I was suggesting.
>
> >
> > > Then, we had also discussed that it would have been more robust to use a 
> > > parser generator...
> >
> > Yes. I looked into that and ended up deciding that a full parser-generator 
> > would be overkill for what I wanted to do and possibly more work to 
> > maintain/customize. I've placed the current low-level parsing code 
> > (consisting of the CharReadBuffer and SimpleTextParser classes) in the 
> > "internal" package o.a.c.geometry.core.io.internal. I don't intend for 
> > these classes to be part of the public API, which would allow us to switch 
> > to a parser-generator later if needed.
>
> OK!
>
> >
> > > I haven't read all the code (sorry!)
> >
> > No worries. There is a lot of it :-)
> >
> > > I'm not convinced by the usage of file "extensions" to determine how the 
> > >

Re: [geometry] IO Modules

2021-01-20 Thread Gilles Sadowski
Le mer. 20 janv. 2021 à 22:42, Matt Juntunen
 a écrit :
>
> Hi Gilles,
>
> > package "io" does not belong in "core".
>
> Ah, yes. I'd like to keep the core and euclidean IO modules separate since 
> I'm picturing a spherical IO module later on down the road. In that case, 
> perhaps we could just put the "io" portion of the module/package name before 
> everything else? That would give us
>
> commons-geometry-io-core - contains package o.a.c.geometry.io.core
> commons-geometry-io-euclidean - contains package 
> o.a.c.geometry.io.euclidean

Yes; having "io" as the top-level package is what I was suggesting.

>
> > Then, we had also discussed that it would have been more robust to use a 
> > parser generator...
>
> Yes. I looked into that and ended up deciding that a full parser-generator 
> would be overkill for what I wanted to do and possibly more work to 
> maintain/customize. I've placed the current low-level parsing code 
> (consisting of the CharReadBuffer and SimpleTextParser classes) in the 
> "internal" package o.a.c.geometry.core.io.internal. I don't intend for these 
> classes to be part of the public API, which would allow us to switch to a 
> parser-generator later if needed.

OK!

>
> > I haven't read all the code (sorry!)
>
> No worries. There is a lot of it :-)
>
> > I'm not convinced by the usage of file "extensions" to determine how the 
> > contents should be parsed.  At first sight, an "enum"-based approach[3] 
> > would be more flexible and clearer from a code design POV.
>
> In contrast to the rest of the library, I'd like to make these IO modules as 
> extensible as possible, the reason being that they form the main connection 
> between the library and external systems. If we use an enum to indicate the 
> format, we restrict usage of the API to only those formats. A string name, on 
> the other hand, allows users to define their own formats and use them with 
> the API.

[I'm only answering based on the impression I get from your description above.]
I don't see the link between "(not) extensible" and "enum":
Extensibility is provided by
API (which classes are public and meant to be reused, e.g. by custom
IO code) while
the "enum" defines the file formats that are "natively" supported by
this library.
[And, in time, custom code can become part of the supported formats and get its
corresponding "enum" element.]
The file extensions could be supported too, but I would think at a higher level;
mapping to either the "enum", or to custom code.

> For example, we will soon be creating a custom geometry format for an 
> application at my day job and I would like to be able to use these IO modules 
> to read and write that format (using custom BoundaryReadHandler3D and 
> BoundaryWriteHandler3D implementations).

Is what I wrote above preventing that?

Best,
Gilles

>
> Thoughts?
>
> Regards,
> Matt J
>
>
>
> 
> From: Gilles Sadowski 
> Sent: Wednesday, January 20, 2021 9:17 AM
> To: Commons Developers List 
> Subject: Re: [geometry] IO Modules
>
> Hi Matt.
>
> Le mer. 20 janv. 2021 à 05:03, Matt Juntunen
>  a écrit :
> >
> > Hello,
> >
> > I've created GEOMETRY-115 and an associated PR [1] containing new modules 
> > for IO functionality. The new modules are
> >
> >   *   commons-geometry-core-io - Common space-independent interfaces and 
> > classes
> >   *   commons-geometry-euclidean-io - Euclidean IO classes; currently 
> > contains support for the 3D formats TXT, CSV, and OBJ
> >
> > The API is based on a core BoundaryIOManager class that delegates to 
> > BoundaryReadHandler and BoundaryWriteHandler implementations based on the 
> > requested data format. For Euclidean 3D space, a convenience IO3D class is 
> > provided with static methods that delegate to a default manager instance. 
> > In addition to reading and writing the core geometric types for the library 
> > (ConvexPolygon3D, Triangle3D), the Euclidean module also supports reading 
> > and writing a FacetDefinition interface, which exposes simple, unvalidated 
> > geometric data. This is intended for accessing raw (possibly invalid) 
> > geometric data from files and writing data contained in external data 
> > structures (for example, a custom facet class). The example below is from 
> > the IO3D class documentation and demonstrates a read, transform, write 
> > operation using streams.
> >
> > final Path origFile = tempDir.resolve("orig.obj");
> > final Path scaledFile = tem

Re: [geometry] IO Modules

2021-01-20 Thread Gilles Sadowski
Hi Matt.

Le mer. 20 janv. 2021 à 05:03, Matt Juntunen
 a écrit :
>
> Hello,
>
> I've created GEOMETRY-115 and an associated PR [1] containing new modules for 
> IO functionality. The new modules are
>
>   *   commons-geometry-core-io - Common space-independent interfaces and 
> classes
>   *   commons-geometry-euclidean-io - Euclidean IO classes; currently 
> contains support for the 3D formats TXT, CSV, and OBJ
>
> The API is based on a core BoundaryIOManager class that delegates to 
> BoundaryReadHandler and BoundaryWriteHandler implementations based on the 
> requested data format. For Euclidean 3D space, a convenience IO3D class is 
> provided with static methods that delegate to a default manager instance. In 
> addition to reading and writing the core geometric types for the library 
> (ConvexPolygon3D, Triangle3D), the Euclidean module also supports reading and 
> writing a FacetDefinition interface, which exposes simple, unvalidated 
> geometric data. This is intended for accessing raw (possibly invalid) 
> geometric data from files and writing data contained in external data 
> structures (for example, a custom facet class). The example below is from the 
> IO3D class documentation and demonstrates a read, transform, write operation 
> using streams.
>
> final Path origFile = tempDir.resolve("orig.obj");
> final Path scaledFile = tempDir.resolve("scaled.csv");
>
> final DoublePrecisionContext precision = new 
> EpsilonDoublePrecisionContext(1e-10);
> final BoundarySource3D src = Parallelepiped.unitCube(precision);
>
> IO3D.write(src, origFile);
>
> final AffineTransformMatrix3D transform = 
> AffineTransformMatrix3D.createScale(2);
>
> try (Stream stream = IO3D.triangles(origFile, precision)) 
> {
> IO3D.write(stream.map(t -> t.transform(transform)), scaledFile);
> }
>
> Feedback is welcome.

The high-level API, illustrated in the example above, looks quite fine.
But we should enforce "separation of concerns"; in this particular case:
package "io" does not belong in "core".

The Java package
  org.apache.commons.geometry.core
is defined the within the (maven) module
  commons-geometry-core
The new Java package
  org.apache.commons.geometry.core.io
is defined the within the (maven) module
  commons-geometry-core-io

The (maven) module name is also often used to define the
Java 9+ module name for JPMS.[1]
IIUC, the package
  org.apache.commons.geometry.core
and its "sub-package"
  org.apache.commons.geometry.core.io
cannot belong to different (JPMS) modules, because (IIUC) a package
hierarchy cannot be shared across modules (since one of the purposes
is to organize "visibility").
Components should be JPMS-compatible.[2]

Fixing that would just require to
1. remove the two modules introduced in PR #130
  commons-geometry-core-io
  commons-geometry-euclidean-io
2. create one (maven) module
  commons-geometry-io
owning the Java packages
  org.apache.commons.geometry.io
  org.apache.commons.geometry.io.core
  org.apache.commons.geometry.io.euclidean

[If you think that it's really worth having separate modules for loading
"euclidean" data vs other space-types (not yet implemented IIUC),
point (2) would be adapted accordingly.]

This would fix both the JPMS issue, and the design issue.

Then, we had also discussed that it would have been more robust to
use a parser generator...
In either case, I'd suggest that we put the actual I/O classes in an
"internal" package, and only advertize the above high-level API (i.e. we
should not be expected to maintain compatibility of the parser code).

I haven't read all the code (sorry!), but in class "IO3D", I'm not convinced
by the usage of file "extensions" to determine how the contents should be
parsed.  At first sight, an "enum"-based approach[3] would be more flexible
and clearer from a code design POV.

Regards,
Gilles

[1] https://www.oracle.com/corporate/features/understanding-java-9-modules.html
[2] See 
https://gitbox.apache.org/repos/asf?p=commons-rng.git;a=tree;f=commons-rng-examples/examples-jpms;hb=HEAD
[3] See 
https://gitbox.apache.org/repos/asf?p=commons-rng.git;a=blob;f=commons-rng-simple/src/main/java/org/apache/commons/rng/simple/RandomSource.java;h=65ca02fdc285fbb7ea8305008dbce21f571191d7;hb=HEAD

>
> Regards,
> Matt J
>
> [1] https://github.com/apache/commons-geometry/pull/130

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All][Math] New GA component

2021-01-20 Thread Gilles Sadowski
Hello.

Le mer. 20 janv. 2021 à 11:11, Avijit Basak  a écrit :
>
> Hello Gilles Sadowski
>
>  Thanks for your reply. Yes I intend to contribute to enhancement
> of the GA functionality as per the JIRA (MATH-1563) proposal.

My proposal was to first create a new component (and, thus, implement
the enhancement over there).
Do you agree to perform the port?  As said in the previous message, this
should be relatively easy, but will require populating a new "git" repository,
using a recent and similar project's (e.g. "Commons Numbers") files as
templates.

> If I find any
> other changes suitable I would also propose the same. Could you kindly look
> into the approval process for this JIRA.

There is no "process" other than the discussions taking place here, on
the "dev" ML.

Regards,
Gilles

>
> Thanks & Regards
> --Avijit Basak
>
> On Wed, 20 Jan 2021 at 04:11, Gilles Sadowski  wrote:
>
> > Hi Avijit.
> >
> > [I've changed the "Subject:" line.]
> >
> > Le mar. 19 janv. 2021 à 08:31, Avijit Basak  a
> > écrit :
> > >
> > > Hello Gilles Sadowski
> > >
> > >  I have extended the current implementation of Genetic Algorithm
> > in a.c.m package and made the probability generation process adaptive. A
> > significant improvement of performance was observed because of this. The
> > current version of implementation in a.c.m.GA incorporates simple genetic
> > algorithm which is not much efficient and useful. However I have extended
> > the same framework to incorporate the enhancement as part of my work.
> > However the library can also be extended to incorporate other advanced
> > concepts of Genetic Programming.
> >
> > Do you intend to do, or otherwise further contribute to the enhancement
> > of the GA functionality?
> >
> > >  To compare with other libraries I have chosen a.c.m because of
> > it's flexible and extensible design.
> >
> > That's good news, despite we never had much feedback about that code
> > base...
> >
> > >  This is to be decided if we need a new component or extend the
> > same component.
> >
> > The functionality in package "o.a.c.m.genetics" does not depend on
> > functionality
> > in other packages (except for exceptions).  Setting up a new component
> > would
> > thus be very easy.
> > Doing so will bring the same maintenance advantage as we have witnessed
> > with
> > the other Commons Math spin-offs.
> >
> > Regards,
> > Gilles
> >
> > >> [...]
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[All][Math] New GA component

2021-01-19 Thread Gilles Sadowski
Hi Avijit.

[I've changed the "Subject:" line.]

Le mar. 19 janv. 2021 à 08:31, Avijit Basak  a écrit :
>
> Hello Gilles Sadowski
>
>  I have extended the current implementation of Genetic Algorithm in 
> a.c.m package and made the probability generation process adaptive. A 
> significant improvement of performance was observed because of this. The 
> current version of implementation in a.c.m.GA incorporates simple genetic 
> algorithm which is not much efficient and useful. However I have extended the 
> same framework to incorporate the enhancement as part of my work. However the 
> library can also be extended to incorporate other advanced concepts of 
> Genetic Programming.

Do you intend to do, or otherwise further contribute to the enhancement
of the GA functionality?

>  To compare with other libraries I have chosen a.c.m because of it's 
> flexible and extensible design.

That's good news, despite we never had much feedback about that code
base...

>  This is to be decided if we need a new component or extend the same 
> component.

The functionality in package "o.a.c.m.genetics" does not depend on functionality
in other packages (except for exceptions).  Setting up a new component would
thus be very easy.
Doing so will bring the same maintenance advantage as we have witnessed with
the other Commons Math spin-offs.

Regards,
Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All][Math] GH PR and Travis-CI

2020-12-22 Thread Gilles Sadowski
Hello.

Le mar. 22 déc. 2020 à 12:57, Peter Lee  a écrit :
>
> > but the build result is not shown in github.
>
> Correction : the build result is shown in github. I missed it cause the tab 
> was hidden. :(
>
> Seems the travis build is triggered 10 hours later than the commit was 
> submitted.

Ah, OK; sorry for the noise.
Thanks for looking into it.

Regards,
Gilles

> cheers,
> Lee
>
> On 12 22 2020, at 7:50, Peter Lee  wrote:
> > It's weird cause travis seems have a successful build with this PR:
> > https://travis-ci.com/github/apache/commons-math/builds/210145524
> > but the build result is not shown in github.
> > Maybe it's just a accidental error and we can have a look till next PR?
> >
> > cheers,
> > Lee
> >
> >
> > On 12 21 2020, at 11:31, Gilles Sadowski  wrote:
> > > Hi.
> > >
> > > Does someone know why PRs don't seem to trigger a Travis
> > > build anymore?
> > >
> > > Example:
> > > PR
> > > https://github.com/apache/commons-math/pull/168
> > > is not picked up here:
> > > https://travis-ci.com/github/apache/commons-math/pull_requests
> > > while nothing has changed in
> > > https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=.travis.yml;h=44fa4f89543fc03fa18974117dda614910e3edad;hb=HEAD
> > > (but perhaps should have?)
> > >
> > > Regards,
> > > Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[All][Math] GH PR and Travis-CI

2020-12-21 Thread Gilles Sadowski
Hi.

Does someone know why PRs don't seem to trigger a Travis
build anymore?

Example:
PR
https://github.com/apache/commons-math/pull/168
is not picked up here:
https://travis-ci.com/github/apache/commons-math/pull_requests
while nothing has changed in

https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=.travis.yml;h=44fa4f89543fc03fa18974117dda614910e3edad;hb=HEAD
(but perhaps should have?)

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All][Math] A Proposal for Implementation of Adaptive Probability Generation Strategy for Genetic Algorithm

2020-12-19 Thread Gilles Sadowski
Hello Avijit Basak.

To get things moving, you could perhaps let us know about your current
usage of the GA functionality implemented in "Commons Math" and what
your plan is to expand on it, if it were to become a new component of this
Apache project (e.g. how it would compare with alternative Java projects
that provide GA).

Thanks,
Gilles

Le ven. 18 déc. 2020 à 16:56, Gilles Sadowski  a écrit :
>
> Le ven. 18 déc. 2020 à 16:25, Avijit Basak  a écrit :
> >
> > Hi All
> >
> > I would like to propose incorporation of adaptive probability
> > generation strategy for Genetic Algorithm implementation of apache commons
> > maths library.
> > Currently Apache's API works on constant probability strategy. I
> > have done some work on the adaptive approach and published in this article "
> > https://www.ijcaonline.org/archives/volume175/number10/basak-2020-ijca-920572.pdf
> > ".
> > I have created a JIRA "MATH-1563" to describe the same.
> > Kindly let me know your views on the same.
>
> FTR, a discussion about how to proceed had started on JIRA:
> https://issues.apache.org/jira/browse/MATH-1563
>
> In short, my take on this is that further development should happen in
> a dedicated (new) component, as per the same rationale that led to
> "Commons RNG", etc.
>
> Regards,
> Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[All][Math] A Proposal for Implementation of Adaptive Probability Generation Strategy for Genetic Algorithm

2020-12-18 Thread Gilles Sadowski
Le ven. 18 déc. 2020 à 16:25, Avijit Basak  a écrit :
>
> Hi All
>
> I would like to propose incorporation of adaptive probability
> generation strategy for Genetic Algorithm implementation of apache commons
> maths library.
> Currently Apache's API works on constant probability strategy. I
> have done some work on the adaptive approach and published in this article "
> https://www.ijcaonline.org/archives/volume175/number10/basak-2020-ijca-920572.pdf
> ".
> I have created a JIRA "MATH-1563" to describe the same.
> Kindly let me know your views on the same.

FTR, a discussion about how to proceed had started on JIRA:
https://issues.apache.org/jira/browse/MATH-1563

In short, my take on this is that further development should happen in
a dedicated (new) component, as per the same rationale that led to
"Commons RNG", etc.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [geometry] Parse error type

2020-12-18 Thread Gilles Sadowski
Hello Matt.

Le ven. 18 déc. 2020 à 16:16, Matt Juntunen
 a écrit :
>
> Hi all,
>
> I've created a simple text parser

Did you consider leaving that job to a specialized tool[1] ?
Because...

> class [1] as part of writing the 3D file format IO functionality for 
> GEOMETRY-101. One of the primary goals of the class is to provide 
> standardized and informative parse errors that include line and column 
> numbers from the text input.

... that's what they do. ;-)

> (Nothing is worse than attempting to process a file and receiving a 
> completely generic, unhelpful exception message.) That portion is working 
> well. However, I'm wondering what exception type to use for these 
> standardized parse errors. I'm currently using IllegalStateException since 
> you can conceptualize the parser as a state machine. However, I feel like 
> there is probably a better solution.
>
> Any ideas?

Any reason for not reusing an existing tool?
If not, I'd be weary to introduce/maintain such a (usually lengthy) code.

In either case (using an existing library or implementing the parser if there
is indeed a good reason to do it), I'd think that it is a good time to create
a new maven module for hosting this functionality; it is more than a usage
"example", and could become the indispensable complement to the "core"
modules (as IIUC the parser creates the Java objects specific to this
component).

Regards,
Gilles

[1] E.g. https://javacc.github.io/javacc (I'm not using it, I've not tested it).

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [geometry] Rename BoundarySource factory methods

2020-12-17 Thread Gilles Sadowski
Le jeu. 17 déc. 2020 à 03:48, Matt Juntunen
 a écrit :
>
> Hi all,
>
> I opened GEOMETRY-109 to rename the BoundarySource2D and BoundarySource3D 
> "from()" static factory methods to "of()" to be more in line with the JDK's 
> similar Stream.of() method. I also think it better reflects the performed 
> operation since no processing is performed on the input.

+1

> It's a small change but it would be a breaking change in the public API 
> (still in beta) so I'd like to make sure that everyone is on board with it.

Thanks,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [MATH] - Truncated Normal Distribution

2020-12-01 Thread Gilles Sadowski
Hello.

Le mar. 1 déc. 2020 à 06:02, Marko Malenic  a écrit :
>
> Awesome.
>
> Should I submit a jira ticket for this?

Yes.
https://issues.apache.org/jira/browse/STATISTICS

Regards,
Gilles

>
> On Tue, Dec 1, 2020 at 12:11 PM Gilles Sadowski 
> wrote:
>
> > Hello.
> >
> > Le mar. 1 déc. 2020 à 01:42, Marko Malenic  a écrit :
> > >
> > >  Hi,
> > >
> > > I'm a bit new to all this stuff, so bear with me while I ask some
> > questions
> > > :)
> > >
> > > There's a few ways to do this.
> > >
> > > In terms of number generation, there's a few algorithms, some of which at
> > > described at:
> > >
> > https://en.wikipedia.org/wiki/Truncated_normal_distribution#Computational_methods
> > > Any preferences on how to generate the numbers?
> > >
> > > I noticed sampling is split off to commons rng.
> > > Should another sampler be added, depending on the algorithm?
> > > Or maybe just using inverse transform sampling would be okay.
> >
> > Some of the implemented distributions use the inverse transform.
> > It's fine and sane to not do everything at once. ;-)
> >
> > Indeed, if you implement another sampler, it must go into the
> > "sampling" module of "Commons RNG", reusing functionality
> > already implemented there (if applicable).
> >
> > Regards,
> > Gilles
> >
> > > [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [MATH] - Truncated Normal Distribution

2020-11-30 Thread Gilles Sadowski
Hello.

Le mar. 1 déc. 2020 à 01:42, Marko Malenic  a écrit :
>
>  Hi,
>
> I'm a bit new to all this stuff, so bear with me while I ask some questions
> :)
>
> There's a few ways to do this.
>
> In terms of number generation, there's a few algorithms, some of which at
> described at:
> https://en.wikipedia.org/wiki/Truncated_normal_distribution#Computational_methods
> Any preferences on how to generate the numbers?
>
> I noticed sampling is split off to commons rng.
> Should another sampler be added, depending on the algorithm?
> Or maybe just using inverse transform sampling would be okay.

Some of the implemented distributions use the inverse transform.
It's fine and sane to not do everything at once. ;-)

Indeed, if you implement another sampler, it must go into the
"sampling" module of "Commons RNG", reusing functionality
already implemented there (if applicable).

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [MATH] - Truncated Normal Distribution

2020-11-30 Thread Gilles Sadowski
Hello.

Le lun. 30 nov. 2020 à 08:22, Marko Malenic  a écrit :
>
> Hi all,
>
> I'm interested in contributing here, and have been wanting to implement and
> add a truncated normal distribution. Would anyone be interested in this?

Contributions welcome. ;-)

This would be an addition for the new "Commons Statistics" component:
http://commons.apache.org/proper/commons-statistics/
in module "distribution":

https://gitbox.apache.org/repos/asf?p=commons-statistics.git;a=tree;f=commons-statistics-distribution

Thanks for your interest,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [apache/commons-codec] (doc) Remove duplicate "from"s in javadoc comments (#66)

2020-11-26 Thread Gilles Sadowski
> > > > > [...]
> > > > > This is an open-source project and to me that means being open *and*
> > > > > transparent.
> > > >
> > > > I agree, but that's beside my point: You require undue work from a
> > > > committer.  People who want/need to engage in history tracking will
> > > > need to turn to the version control system anyways.  The PR number
> > > > alone does not mean anything and is just noise in that report.
> > > >
> > >
> > > A one liner in a text file is "undue work"? Yikes.
> >
> > For such a change, yes.
> > Worse than that: IMO, "nit-pick" changes should *not* be
> > listed in "changes.xml", lest we end up with an illegible
> > report where important modifications are drowned within a
> > useless enumeration of spelling corrections.
> >
>
> It seems to me that you're losing sight of the openness and transparency
> aspect of an open source project: If someone is going to take the time and
> effort to fix small issues like typos or spelling mistakes, this should be
> documented and credited in the project files and site. I do not consider
> git history to provide this level of exposure.

YMMV: It does provide exposure if the PR was merged (cf. some GH
chart, at least last time I checked...).

> This type of polish should not be neglected; documentation and presentation
> of a project is important as it offers a glimpse as to the kind of
> attention to detail a project provides.

False attribution: It is not neglected (change was performed).
[Alex is particularly careful about "aesthetics".  And so am I.]

> This kind of change can also be a way for new contributors to get started
> participating in a project and documenting these changes is part of a
> community welcoming these contributions by actually showing, through our
> documenting them in changes.xml, that we value contributions.

False attribution: We value contributions (Alex posted an explicit
appreciation on GH).

Contributing to an free project is not a popularity contest: If looking
for "visibility" is the primary goal, that's not a good start (IMO).

> You, of course, are free to leave such contributions uncredited.

Could you please not propagate this kind of falsity?

If you are not happy with the contents of "changes.xml", you are
of course welcome to fix it.


Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [apache/commons-codec] (doc) Remove duplicate "from"s in javadoc comments (#66)

2020-11-26 Thread Gilles Sadowski
Le jeu. 26 nov. 2020 à 16:04, Gary Gregory  a écrit :
>
> On Thu, Nov 26, 2020 at 10:00 AM Gilles Sadowski 
> wrote:
>
> > Gary,
> >
> > Le jeu. 26 nov. 2020 à 15:47, Gary Gregory  a
> > écrit :
> > >
> > > On Thu, Nov 26, 2020 at 9:43 AM Gilles Sadowski 
> > > wrote:
> > >
> > > > Hello Gary.
> > > >
> > > > Le jeu. 26 nov. 2020 à 15:09, Gary Gregory  a
> > > > écrit :
> > > > >
> > > > > Please update changes.xml when you merge a PR.
> > > >
> > > > Since when do we require that for Javadoc clean-up?
> > > >
> > >
> > > It's more about providing *transparency* and *documenting* the fact that
> > > this change came from a PR. If the change came from a Jira issue, we'd
> > have
> > > an entry to show the Jira issue ID.
> > >
> > > This is an open-source project and to me that means being open *and*
> > > transparent.
> >
> > I agree, but that's beside my point: You require undue work from a
> > committer.  People who want/need to engage in history tracking will
> > need to turn to the version control system anyways.  The PR number
> > alone does not mean anything and is just noise in that report.
> >
>
> A one liner in a text file is "undue work"? Yikes.

For such a change, yes.
Worse than that: IMO, "nit-pick" changes should *not* be
listed in "changes.xml", lest we end up with an illegible
report where important modifications are drowned within a
useless enumeration of spelling corrections.

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [apache/commons-codec] (doc) Remove duplicate "from"s in javadoc comments (#66)

2020-11-26 Thread Gilles Sadowski
Gary,

Le jeu. 26 nov. 2020 à 15:47, Gary Gregory  a écrit :
>
> On Thu, Nov 26, 2020 at 9:43 AM Gilles Sadowski 
> wrote:
>
> > Hello Gary.
> >
> > Le jeu. 26 nov. 2020 à 15:09, Gary Gregory  a
> > écrit :
> > >
> > > Please update changes.xml when you merge a PR.
> >
> > Since when do we require that for Javadoc clean-up?
> >
>
> It's more about providing *transparency* and *documenting* the fact that
> this change came from a PR. If the change came from a Jira issue, we'd have
> an entry to show the Jira issue ID.
>
> This is an open-source project and to me that means being open *and*
> transparent.

I agree, but that's beside my point: You require undue work from a
committer.  People who want/need to engage in history tracking will
need to turn to the version control system anyways.  The PR number
alone does not mean anything and is just noise in that report.

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [apache/commons-codec] (doc) Remove duplicate "from"s in javadoc comments (#66)

2020-11-26 Thread Gilles Sadowski
Hello Gary.

Le jeu. 26 nov. 2020 à 15:09, Gary Gregory  a écrit :
>
> Please update changes.xml when you merge a PR.

Since when do we require that for Javadoc clean-up?

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [commons-codec] branch master updated: Track changes

2020-11-26 Thread Gilles Sadowski
Hello.

Le jeu. 26 nov. 2020 à 15:22, Gary Gregory  a écrit :
>
> May you please show the PR number in the description? See the first action
> in the list. This will give us better provenance for changes.

At first thought, I don't see that this info belongs in the "changes" file.
[Rationale: Display *what* has changed (and *why* if there is an associated
report).  *How* (provenance) is provided by the commit.]

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Hello Apache Commons :)

2020-11-25 Thread Gilles Sadowski
Hello.

Le mar. 24 nov. 2020 à 21:10, Kanak Sony  a écrit :
>
> Hey all,
>
> I am very new to OSS Contribution and would like to help.

Welcome.

> Is there
> anything that I can start with and pick up?

Many things! ;-)

A general approach would be to review the bug-tracking project
of the components which you are interested in.
On a personal note, I'd suggest:
https://issues.apache.org/jira/projects/MATH
whose release is long overdue.

Thanks,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Dependabot pr's

2020-10-16 Thread Gilles Sadowski
It would be so great to be able to act differently (i.e. redirecting
to *different*
lists) depending on whether the sender is a bot or a human being.
This used to be considered a feature (cf. "robots.txt" for web crawlers).

Gilles

Le ven. 16 oct. 2020 à 14:36, Rob Tompkins  a écrit :
>
> I’m a +0.5 to a notifications (GitHub + Jira) list. This seems reasonable to 
> me.
>
> -Rob
>
> > On Oct 16, 2020, at 2:43 AM, Mark Thomas  wrote:
> >
> > On 15/10/2020 19:30, Gary Gregory wrote:
> >>> On Thu, Oct 15, 2020 at 1:57 PM Bernd Eckenfels 
> >>> wrote:
> >>>
> >>> Before we do that, I need help. I am considering to ignore or unsubscribe
> >>> the commit mailing list. Which is IMHO not a good thing (from the point of
> >>> security reviews). However I cannot keep up with dependable suggestions
> >>> (and don’t have an easy way to filter - and frankly I don’t want to spent
> >>> any time on finding one)
> >>>
> >>> So can we turn the notifications off or at least send them to a different
> >>> mailinglist?
> >>>
> >>
> >> Dependabot emails are sent from notificati...@github.com, so we could ask
> >> infra to create a list called... gh-no...@commons.apache.org?
> >
> > notificati...@commons.apache.org would be the standard name.
> >
> > Mark

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Dependabot pr's

2020-10-15 Thread Gilles Sadowski
Hi.

Le jeu. 15 oct. 2020 à 19:57, Bernd Eckenfels  a écrit :
>
> Before we do that, I need help. I am considering to ignore or unsubscribe the 
> commit mailing list. Which is IMHO not a good thing (from the point of 
> security reviews). However I cannot keep up with dependable suggestions (and 
> don’t have an easy way to filter - and frankly I don’t want to spent any time 
> on finding one)
>
> So can we turn the notifications off or at least send them to a different 
> mailinglist?

+2
(I asked the same, some time ago.)

Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



<    1   2   3   4   5   6   7   8   9   10   >