Hello.

Le mar. 18 févr. 2020 à 04:49, 陈 涛 <chentao...@hotmail.com> a écrit :
>
> Hi Gilles:
>
>    I really do not know if anyone received my last mail, no one replay me for 
> a long time so I send it again and copy to you with another email system.

Sorry for the delay. :-}

>
> > Some remarks:
>
> > * I didn't get why the "KMeansPlusPlusCentroidInitializer" class
> > does not call the existing "KMeansPlusPlusClusterer".
> > Code seems duplicated: As there is a case for reuse, the currently
> > "private" centroid initialization code should be factored out.
>
> This is alpha version for discuss the "MiniBatchKMeansClusterer" algorithm,

I guess that you mean that we discuss your implementation of the
algorithm referenced in the Javadoc.

> and when "centroidOf" is extracted from "KMeansPlusPlusClusterer",
>
> the "KMeansPlusPlusClusterer" is not "KMeansPlusPlusClusterer" anymore but 
> "KMeansClusterer",

I don't follow.

>
> this is a significant change, so I did not reactor it.

Significant changes are welcome (since the next release will contain
other major changes anyways) if they improve the code base (like e.g.
reducing code duplication).

>
>
>
> > * In "CentroidInitializer", I'd rename "chooseCentroids" to emphasize
> > that a computation is performed (as opposed to selecting from an
> > existing data structure).

I think I'd prefer "selectCentroids".

>
> It is extract from "KMeansPlusPlusClusterer.centroidOf", should remain be 
> "centroidOf"?

I don't understand.

It would be easier if you create a pull request, so that we can clearly see
what codes are added/removed/changed.

>
> The subclass "RandomCentroidInitializer" and 
> "KMeansPlusPlusCentroidInitializer" indicate the algorithm used.
>
>
> > * Not convinced that there should be so many constructors (in most
> > cases, it makes no sense to pick default values that are likely to
> > be heavily application-dependent.
>
> I can add more constructors.

No, the constructors with default values clutter the API, for
no obvious gain IMHO.
[If the default values make sense, they must be documented.]

>
> I'd like a builder class more than constructors, but does not meet the 
> historical code style.

Now is the time for improving the API.
It would be quite helpful to create a report on JIRA with "sub-tasks"
for all such API proposed changes.

> > * Input should generally be validated: e.g. the maximum number of
> > iterations should not be changed unwittingly; rather, an exception
> > should be raised if the user passed a negative value.
>
> Thanks for your advices, I will improve these.
>
> > Could be nice to illustrate (not just with a picture, but in a table
> > with entries average over several runs) the differences in result
> > between the implementations, using various setups (number of
> > clusters, stopping criterion, etc.).
>
> I will make more tests, include benchmarks.
>
> It is a challenge for me to generate the various kinds of test data,
>
> could !nybody supply me the test data of this comparsion: 
> http://commons.apache.org/proper/commons-math/userguide/ml.html

They are generated programmatically; code is here:
    
https://gitbox.apache.org/repos/asf?p=commons-math.git;a=tree;f=src/userguide/java/org/apache/commons/math4/userguide

[I've just updated the codes so that they compiles and run using
the new dependencies (see the "README" file).]

> > "MT_64" is probably not the best default.  And this is one of the
> > parameters from which there should not be a default IMO.
>
~ I will do more tests

You don't need to test the generators; users should choose
by themselves (from those provided in "Commons RNG").

>
> > [Note: there are spurious characters in your message (see e.g. the
> > paragraph quoted just above) that make it difficult to read.]
>
> I had well format my mail in my mail box, it may been changed by the Mail 
> List service.
>
> I will try various kinds of mail editor. It will helpful if you told me which 
> mail editor is work well with the ML.

It's probably an encoding thing (setting it to "UTF-8" should be
fine).

Best regards,
Gilles

> > [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to