Re: About reuters-fkmeans-centroids

Dmitriy Lyubimov Thu, 28 Apr 2016 14:14:43 -0700

Prakash,

(1) to be clear, the ASF trademark and branding policy is not to endorse
views of the 3rd party publications and to ask 3rd party writers to do a
disclosure that their views are not endorsed by ASF project. To that end,
ASF project can't really tell you that some publication is
"(in)appropriate". 3rd party publications are of their own account and
cannot be by default tied to the ASF views. That said, committers have
their opinions, which of course exhibit certain variation, and some things
do get linked on the site or mentioned on Twitter via Mahout account. But
some do not. Best practice is always to ask for pointers on the list first.

(2) I am not sure what your definition of "appropriate" is, but on personal
note, most of these links were quite "appropriate" at the time in the sense
that they were published prior to release 0.10 and 2/2014 or before 0.10,
 and therefore were describing what was in the project at that time. Thus,
MIA fuzzy k-means example in your very link is dated back of June 2011 and
is relevant to release circa 0.6 or 0.7. So if you mean whether those
algorithms were "in the fold" back then, the answer is yes, they were. I
see no contradiction between these publications and the current reality.

(3) If something deprecated reasonably works for a particular purpose, I
think there's no reason not to use/write about it.

*However, I just don't think most of these particular deprecated Java-based
MR algorithms work for the purposes of an established benchmark or a
standard in a research -- modern edgy ML is usually much more faster (and
often, more convenient too). *

Don't mean to come across as preachy, but research is usually held to quite
different standard as it comes to claims, than an ad-hoc industrial
application or a blog entry. I simply can't see how any of MR stuff can
work for that purpose today.

(4) if your "appropriate"-ness question is really about why they were
deprecated, well, there are two main reasons for that. First, it seems that
the realization of MR limitations w.r.t. iterative applications quickly
caught up with both users and contributors, and, second, most contributors
abandoned their MR contributions (most likely for the same reason). I
contributed a couple of MR algorithms back in 2010-2011 but i am absolutely
fine with them being deprecated and written off the books. If something is
not being used, or people (exactly as your case has demonstrated) don't get
answers to their questions, or bugs are not being fixed, it is difficult to
justify keeping the code. It is much easier to focus on what is actually
being used and maintained instead. Here, the very banal and boring reason
for the deprecations.

(5) Finally, If your goal is simply to learn "how the project works", just
like Suneel said, i'd suggest to follow release notes and the project site
(news and howtos) -- your last link in fact should perhaps be your first.
And the list, of coure.

As you probably can tell by release notes, the last two years were
practically exclusively about multiplatform Mahout involvement with Spark,
Flink and H20 backends, as well as the Samsara environment for general
numeric analysis (but no MR stuff beyond very nominal fixes).

I also agree that it looks like the Mahout site perhaps should be more
clear about the status of MR algorithms (it used to be more clear, I think,
but every news eventually becomes an old news).

Hope this clarifies.

-d

On Thu, Apr 28, 2016 at 12:02 PM, Prakash Poudyal <prakashpoud...@gmail.com>
wrote:

> Hi!
>
> Thank you for your emails !!
>
> Actually, I  need to use fuzzy clustering to cluster the sentence in my
> research. This is my goal.
>
> I started to use Fuzzy K means clustering of Mahout since last week !!! I
> found several blogs links, and many other helpful documents !!!! I was
> going through, as being new, I realize this the best, easy and fast way to
> know about Mahout works. In my opinion, many new commers do the same as I
> do. After being used to the tools, than only people focus on the works and
> go deeply.
>
> I had gone through many blogs and sites to know about Mahout, some of them
> are below :
>
> http://technobium.com/introduction-to-clustering-using-apache-mahout/
>
> http://tuxdna.github.io/pages/mahout.html
>
>
> https://github.com/tdunning/MiA/blob/master/src/main/java/mia/clustering/ch09/FuzzyKMeansExample.java
>
> http://www.programering.com/a/MDNwgTMwATI.html
>
>
> https://www.safaribooksonline.com/library/view/apache-mahout-clustering/9781783284436/ch04.html
>
> https://ymnliu.wordpress.com/2015/11/05/install-apache-mahout-in-eclipse/
>
> https://mahout.apache.org/
>
> What do you say about these sites !! Is these sites are not appropriate
> ???
>
> I raise my problem several time, in mailing list and even IRC but I got
> response !!  just today :(
>
> So finally, it would be great, if you could reply the answers of my
> following question .
>
> Is Apache Mahout appropriate tool for clustering sentences through
> fuzzy-clustering ?
>
> If answer is  "YES"
>
>     Which version of Mahout ?
>
>     Can you write the steps that I need to followed, or give me
> appropriate documentation (links) ?
>
>
> Thanks
> Prakash Poudyal
> Portugal
>
>
>

Re: About reuters-fkmeans-centroids

Reply via email to