Re: Supported Lucene Index Version

2016-09-13 Thread Suneel Marthi
Work off of the snapshots, 0.12.2 doesn't have this fix 

Sent from my iPhone

> On Sep 13, 2016, at 6:01 PM, Reth RM <reth.ik...@gmail.com> wrote:
> 
> I'm on apache-mahout-distribution-0.12.2(latest release) and solr 4.10.3.
> But getting exactly same error as reported in this jira 1876.
> 
> Cloned master-snapshot from git https://github.com/apache/mahout/tree/master
> 
> Tried mvn clean install but it requires maven version 3.3?
> error "Detected Maven Version: 3.2.5 is not in the allowed range [3.3.3,)."
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:14 AM, Raviteja Lokineni <
> raviteja.lokin...@gmail.com> wrote:
> 
>> FYI, the versions quoted are for SNAPSHOT. They will be available in 13.0
>> probably, as per the below ticket.
>> 
>> https://issues.apache.org/jira/browse/MAHOUT-1876
>> 
>>> On Mon, Sep 12, 2016 at 6:17 PM, Suneel Marthi <smar...@apache.org> wrote:
>>> 
>>> Its Lucene 5.5.2.
>>> 
>>> Solr 6.0 and above mandate Java 8.
>>> 
>>>> On Tue, Sep 13, 2016 at 12:04 AM, Reth RM <reth.ik...@gmail.com> wrote:
>>>> 
>>>> What is the latest lucene index version that is supported?
>>>> 
>>>> trying to generate lucene vectors, index created using solr 4.10.2 and
>>> solr
>>>> 6.0 apis.
>>>> 
>>>> command
>>>> 
>>>>> ./mahout lucene.vector --dir usr/test/solr-4.10.2/example/
>>>> solr/collection1/data/index
>>>>> --output /user/test/part-out.vec --field description --idField id
>>>> --dictOut
>>>>> /user/test/dict.txt
>>>> 
>>>> 
>>>> 
>>>> Exception in thread "main" org.apache.lucene.index.
>>>> IndexFormatTooNewException:
>>>> Format version is not supported (resource: NIOFSIndexInput(path="/user/
>>>> test/solr-4.10.2/example/solr/collection1/data/index/segments.gen")):
>> -3
>>>> (needs to be between -2 and -2)
>> 
>> 
>> 
>> --
>> *Raviteja Lokineni* | Business Intelligence Developer
>> TD Ameritrade
>> 
>> E: raviteja.lokin...@gmail.com
>> 
>> [image: View Raviteja Lokineni's profile on LinkedIn]
>> <http://in.linkedin.com/in/ravitejalokineni>
>> 


Re: Supported Lucene Index Version

2016-09-12 Thread Suneel Marthi
Its Lucene 5.5.2.

Solr 6.0 and above mandate Java 8.

On Tue, Sep 13, 2016 at 12:04 AM, Reth RM  wrote:

> What is the latest lucene index version that is supported?
>
> trying to generate lucene vectors, index created using solr 4.10.2 and solr
> 6.0 apis.
>
> command
>
> > ./mahout lucene.vector --dir usr/test/solr-4.10.2/example/
> solr/collection1/data/index
> > --output /user/test/part-out.vec --field description --idField id
> --dictOut
> > /user/test/dict.txt
>
>
>
> Exception in thread "main" org.apache.lucene.index.
> IndexFormatTooNewException:
> Format version is not supported (resource: NIOFSIndexInput(path="/user/
> test/solr-4.10.2/example/solr/collection1/data/index/segments.gen")): -3
> (needs to be between -2 and -2)
>


Re: AbstractJob class not found exception

2016-08-16 Thread Suneel Marthi
Which Mahout version are u running?

On Tue, Aug 16, 2016 at 7:10 AM, Lee S  wrote:

> I try to run  local mahout job in my main function,
>
> but when execute it come out with exception:
>
> java.lang.NoClassDefFoundError: org/apache/mahout/common/AbstractJob
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>
> I have added the mahout-mr in my pom, do I need to add other dependency?
>
> Can anybody help? Thanks very much.
>


Re: Text clustering how to?

2016-07-27 Thread Suneel Marthi
You did get a reply via jira, please stop spamming Mahout and OpenNLP
mailing listswith the same question.
The book u r looking at 'Taming Text' is from 2011-12, and both OpenNLP and
Mahout projects have long diverged from the book.

If u r following the book for ur learning, u may be better off learning on
your own from the project.

On Wed, Jul 27, 2016 at 7:33 PM, Dmitriy Lyubimov  wrote:

> I think you have got a reply via jira.
>
> On Wed, Jul 27, 2016 at 10:50 AM, Raviteja Lokineni <
> raviteja.lokin...@gmail.com> wrote:
>
> > Anybody?
> >
> > On Thu, Jul 21, 2016 at 10:42 AM, Raviteja Lokineni <
> > raviteja.lokin...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I am pretty new to Apache Mahout. I am trying to figure out how to do
> > text
> > > clustering, I was following the book Taming Text (Manning). Looking at
> > the
> > > book I tried to run Mahout and stumbled upon a version incompatibility
> > with
> > > latest Lucence indexes. I therefore opened up:
> > > https://issues.apache.org/jira/browse/MAHOUT-1876
> > >
> > > Looks like the code responsible for doing what I needed to do is in
> > legacy
> > > map reduce code. Is there any supported(which is not deprecated or
> > legacy)
> > > approach to achieve what I am supposed to do?
> > >
> > > Was wondering if someone would push / kick me in the right direction ☺.
> > >
> > > Thanks,
> > > --
> > > *Raviteja Lokineni* | Business Intelligence Developer
> > > TD Ameritrade
> > >
> > > E: raviteja.lokin...@gmail.com
> > >
> > > [image: View Raviteja Lokineni's profile on LinkedIn]
> > > 
> > >
> > >
> >
> >
> > --
> > *Raviteja Lokineni* | Business Intelligence Developer
> > TD Ameritrade
> >
> > E: raviteja.lokin...@gmail.com
> >
> > [image: View Raviteja Lokineni's profile on LinkedIn]
> > 
> >
>


[ANNOUNCE] Apache Mahout 0.12.2 Release

2016-06-13 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.12.2
which is a minor release following 0.12.1 in May 2016.


Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

Having evaluated Java Swing based plotting capabilities, The project has
decided that currently, the best direction for data visualization is to
integrate with Apache Zeppelin.

Mahout developers are working towards a Mahout Zeppelin interpreter for
visualization and other notebook capabilities [1].

Mahout 0.12.2 is a maintenance release over Mahout 0.12.1 mainly focused on
providing helper functions for Mahout-Zeppelin integration with a couple of
minor bug fixes [2].


   1.

   MAHOUT-1866 Add matrix-to-tsv string function.
   2.

   MAHOUT-1863 cluster-syntheticcontrol.sh errors out with "Input path does
   not exist".
   3.

   MAHOUT-1868 purge smile plotting from the codebase.



Many thanks to all Apache committers and contributors.  Special thanks to
Trevor Grant and Albert Chu.

.

Future Roadmap:


   1.

   Many Online and Batch Algorithm additions.
   2.

   Support for Native Optimizations.
   3.

   Performance enhancements for Samsara Framework.
   4.

   Performance enhancements for Algebraic Operations.
   5.

   Integration of Mahout DRMs with Apache Arrow


[1]https://issues.apache.org/jira/browse/ZEPPELIN-116

[2]
https://issues.apache.org/jira/browse/MAHOUT-1868?jql=project%20%3D%20MAHOUT%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.13.0


Re: [VOTE] Mahout 0.12.2 Release Candidate 2

2016-06-13 Thread Suneel Marthi
This VOTE has passed with 3 +1 s and is now officially closed, will send an
announce once the release is finalized.

On Fri, Jun 10, 2016 at 10:47 PM, Andrew Palumbo <ap@outlook.com> wrote:

> +1 (binding) that is..  per last email tested MR wikipedia example and
> spark document classifier without issue.
>
>  Original message 
> From: Andrew Palumbo <ap@outlook.com>
> Date: 06/10/2016 10:44 PM (GMT-05:00)
> To: d...@mahout.apache.org, user@mahout.apache.org
> Subject: RE: [VOTE] Mahout 0.12.2 Release Candidate 2
>
> +1 ran classify-wikipedia.sh MR script, launched shell and ran
> spark-document-classifier.mscala in standalone cluster mode.
>
>  Original message 
> From: Andrew Musselman <andrew.mussel...@gmail.com>
> Date: 06/10/2016 9:23 PM (GMT-05:00)
> To: user@mahout.apache.org
> Cc: mahout <d...@mahout.apache.org>
> Subject: Re: [VOTE] Mahout 0.12.2 Release Candidate 2
>
> Signatures and hashes are correct; +1 (binding).
>
> On Fri, Jun 10, 2016 at 6:05 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > Verified {bin} * {zip,tar} - ran tests, tests pass
> >
> >
>


Re: [VOTE] Mahout 0.12.2 Release Candidate 2

2016-06-10 Thread Suneel Marthi
Verified {bin} * {zip,tar} - ran tests, tests pass

Verified {src} * {zip,tar} - rant tests, tests pass

Here's my +1 (binding)

On Fri, Jun 10, 2016 at 8:59 PM, Suneel Marthi <smar...@apache.org> wrote:

> This is the vote for release 0.12.2 of Apache Mahout.
>
> The vote will be going for at least 72 hours and will be closed on Sunday,
> June 12th, 2016 or once there are at least 3 PMC +1 binding votes (which
> ever occurs earlier).  Please download, test and vote with
>
> [ ] +1, accept RC as the official 0.12.2 release of Apache Mahout
> [ ] +0, I don't care either way,
> [ ] -1, do not accept RC as the official 0.12.2 release of Apache Mahout,
> because...
>
>
> Maven staging repo:
>
>  https://repository.apache.org/content/repositories/orgapachemahout-1025/
> <https://repository.apache.org/content/repositories/orgapachemahout-1025/>
>
> The git tag to be voted upon is mahout-0.12.2
>


[VOTE] Mahout 0.12.2 Release Candidate 2

2016-06-10 Thread Suneel Marthi
This is the vote for release 0.12.2 of Apache Mahout.

The vote will be going for at least 72 hours and will be closed on Sunday,
June 12th, 2016 or once there are at least 3 PMC +1 binding votes (which
ever occurs earlier).  Please download, test and vote with

[ ] +1, accept RC as the official 0.12.2 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.12.2 release of Apache Mahout,
because...


Maven staging repo:

 https://repository.apache.org/content/repositories/orgapachemahout-1025/


The git tag to be voted upon is mahout-0.12.2


Re: [VOTE] Apache Mahout 0.12.2 Release Candidate

2016-06-10 Thread Suneel Marthi
This VOTE is cancelled in favor of another release candidate, uncovered an
issue with Spark shell; will announce a new RC in a short while.

On Fri, Jun 10, 2016 at 7:01 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Signatures and hashes look good; built from source tarball and all tests
> pass.
>
> +1 (binding)
>
> On Fri, Jun 10, 2016 at 2:25 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > Verified {bin} * {zip,tar} - ran tests, tests pass
> >
> > Verified {src} * {zip,tar} - rant tests, tests pass
> >
> > Here's my +1 (binding)
> >
> > On Fri, Jun 10, 2016 at 5:12 PM, Suneel Marthi <smar...@apache.org>
> wrote:
> >
> > > This is the vote for release 0.12.2 of Apache Mahout.
> > >
> > > The vote will be going for at least 72 hours and will be closed on
> > Sunday,
> > > June 12th, 2016 or once there are at least 3 PMC +1 binding votes
> (which
> > > ever occurs earlier).  Please download, test and vote with
> > >
> > > [ ] +1, accept RC as the official 0.12.2 release of Apache Mahout
> > > [ ] +0, I don't care either way,
> > > [ ] -1, do not accept RC as the official 0.12.2 release of Apache
> Mahout,
> > > because...
> > >
> > >
> > > Maven staging repo:
> > >
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1024/
> > >
> > > The git tag to be voted upon is mahout-0.12.2
> > >
> >
>


Re: [VOTE] Apache Mahout 0.12.2 Release Candidate

2016-06-10 Thread Suneel Marthi
Verified {bin} * {zip,tar} - ran tests, tests pass

Verified {src} * {zip,tar} - rant tests, tests pass

Here's my +1 (binding)

On Fri, Jun 10, 2016 at 5:12 PM, Suneel Marthi <smar...@apache.org> wrote:

> This is the vote for release 0.12.2 of Apache Mahout.
>
> The vote will be going for at least 72 hours and will be closed on Sunday,
> June 12th, 2016 or once there are at least 3 PMC +1 binding votes (which
> ever occurs earlier).  Please download, test and vote with
>
> [ ] +1, accept RC as the official 0.12.2 release of Apache Mahout
> [ ] +0, I don't care either way,
> [ ] -1, do not accept RC as the official 0.12.2 release of Apache Mahout,
> because...
>
>
> Maven staging repo:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1024/
>
> The git tag to be voted upon is mahout-0.12.2
>


Re: Stickers

2016-06-02 Thread Suneel Marthi
The MapR folks were referring me to stickermule for customized stickers, so
great to see that most Apache projects r already working with stickermule.



On Thu, Jun 2, 2016 at 10:12 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> The link to buy:
> https://www.stickermule.com/en/marketplace/13179-apache-mahout
>
> On Thu, Jun 2, 2016 at 7:01 PM, Andrew Musselman <
> andrew.mussel...@gmail.com
> > wrote:
>
> >
> >
> https://www.stickermule.com/artworks/755889?token=cded79155151fd30df04ad7f2a37cbc3
> >
> > On Thu, Jun 2, 2016 at 6:24 PM, Andrew Musselman <
> > andrew.mussel...@gmail.com> wrote:
> >
> >> Ordered a hundred; will post the proof when it's ready.
> >>
> >> On Thu, Jun 2, 2016 at 6:19 PM, Andrew Musselman <
> >> andrew.mussel...@gmail.com> wrote:
> >>
> >>> How many to start with, 100?
> >>>
> >>> On Thu, Jun 2, 2016 at 6:16 PM, Suneel Marthi <suneel.mar...@gmail.com
> >
> >>> wrote:
> >>>
> >>>> The one pointed below is what I would stick on my laptop
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>> > On Jun 2, 2016, at 9:12 PM, Andrew Musselman <
> >>>> andrew.mussel...@gmail.com> wrote:
> >>>> >
> >>>> > Yeah there's also a "powered by" version with that in the mahout src
> >>>> at
> >>>> > src/main/images/logos.
> >>>> >
> >>>> > On Thu, Jun 2, 2016 at 6:07 PM, Suneel Marthi <
> >>>> suneel.mar...@gmail.com>
> >>>> > wrote:
> >>>> >
> >>>> >> Yup that's the one
> >>>> >>
> >>>> >> Sent from my iPhone
> >>>> >>
> >>>> >>> On Jun 2, 2016, at 9:06 PM, Andrew Palumbo <ap@outlook.com>
> >>>> wrote:
> >>>> >>>
> >>>> >>> this guy?
> >>>> >>
> >>>>
> https://svn.apache.org/repos/asf/mahout/site/mahout_cms/trunk/content/images/mahout-logo.svg
> >>>> >>>
> >>>> >>> 
> >>>> >>> From: Andrew Musselman <andrew.mussel...@gmail.com>
> >>>> >>> Sent: Thursday, June 2, 2016 6:53:42 PM
> >>>> >>> To: d...@mahout.apache.org; Justin Dorfman
> >>>> >>> Subject: Stickers
> >>>> >>>
> >>>> >>> Team, I'd like to introduce Justin Dorfman, who's been
> volunteering
> >>>> to
> >>>> >> help
> >>>> >>> projects with producing stickers, and has been talking with Sally
> >>>> >> Khudairi
> >>>> >>> about how to help projects out in general with marketing.
> >>>> >>>
> >>>> >>> Justin, could you let us know how to go about it? We have logo art
> >>>> >> already
> >>>> >>> unless the team felt like updating it.
> >>>> >>>
> >>>> >>> Thanks!
> >>>> >>
> >>>>
> >>>
> >>>
> >>
> >
>


Re: Welcome Trevor Grant as a new Mahout Committer

2016-05-24 Thread Suneel Marthi
Welcome Trevor !!! Kokanee Cheers !!

On Mon, May 23, 2016 at 8:39 PM, Andrew Palumbo  wrote:

> In recognition of Trevor Grant's contributions to the Mahout project
> notably his Zeppelin Integration work, the PMC has invited and is pleased
> to announce that he has accepted our invitation to join the Mahout project
> as a committer.
>
> As is customary, I will leave it to Trevor to provide a little bit of
> background about himself.
>
> Congratulations and Welcome!
>
> -Andrew Palumbo
> On Behalf of the Mahout PMC
>


[ANNOUNCE] Apache Mahout 0.12.1 Release

2016-05-18 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.12.1
which is a minor release following 0.12.0 release on April 11, 2016.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

Mahout 0.12.1 is a maintenance release over Mahout 0.12.0 addresses the
following issues with Apache Flink integration:

MAHOUT-1859:  Disable non working msurf and mgrid before Mahout 0.12.1
release

MAHOUT-1848:  drmSampleKRows in FlinkEngine should generate a dense or
sparse matrix

MAHOUT-1847: drmSampleRows in FlinkEngine doesn't wrap Int Keys when
ClassTag is of type Int

MAHOUT-1841: Matrices.symmetricUniformView(...) returning values in the
wrong range.

MAHOUT-1836:Order and add missing paramters for
DictionaryVectorizer.createTermFrequencyVectors() javadoc parameter
comments.

MAHOUT-1835 Remove countsPerPartition in Flink/blas/package.scala

MAHOUT-1834: Setup Travis CI for Mahout

MAHOUT-1833: Enhance svec function to accept cardinality as parameter

MAHOUT-1832: Upgrade Jackson version and references to 2.x

MAHOUT-1827: Suggested changes to homepage, how to contribute

Upgrade to Apache Flink 1.0.3

Experimental Mahout 2d and 3d plotting

Many thanks to all Apache committers and contributors.  Special thanks to Shane
Curcuru
,
Edmond Luo and  for their contributions.

Future Roadmap:


   1.

   Zeppelin integration for Mahout on Spark.
   2.

   Plotting Capabilities for Mahout matrices and DRMs
   3.

   Many Online and Batch Algorithm additions.
   4.

   Support for Native Optimizations.
   5.

   Performance enhancements for Samsara Framework.
   6.

   Performance enhancements for Algebraic Operations.


Re: [VOTE] Apache Mahout 0.12.1 Release

2016-05-18 Thread Suneel Marthi
Thanks we have 3 +1s and no -1s. This vote is officially closed, will make
an announce once the release is finalized.

On Wed, May 18, 2016 at 7:36 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Sigs and hashes are good; +1 (binding).
>
> On Wed, May 18, 2016 at 3:53 PM, Andrew Palumbo <ap@outlook.com>
> wrote:
>
> > +1 (binding) tested a clean source build.
> >
> > ________
> > From: Suneel Marthi <smar...@apache.org>
> > Sent: Wednesday, May 18, 2016 6:23:57 PM
> > To: mahout; user@mahout.apache.org
> > Subject: Re: [VOTE] Apache Mahout 0.12.1 Release
> >
> > Verified {src} * {tar, zip}
> >
> > Ran a clean build and tests and see no issues
> >
> > +1 (binding)
> >
> > On Wed, May 18, 2016 at 6:07 PM, Suneel Marthi <smar...@apache.org>
> wrote:
> >
> > > This is the vote for release 0.12.1 of Apache Mahout.
> > >
> > > The vote will be going for at least 72 hours and will be closed on
> > > Wednesday,
> > > May 21th, 2016.  Please download, test and vote with
> > >
> > > [ ] +1, accept RC as the official 0.12.1 release of Apache Mahout
> > > [ ] +0, I don't care either way,
> > > [ ] -1, do not accept RC as the official 0.12.1 release of Apache
> Mahout,
> > > because...
> > >
> > >
> > > Maven staging repo:
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1023
> > >
> > > The git tag to be voted upon is release-0.12.1
> > >
> >
>


Re: [VOTE] Apache Mahout 0.12.1 Release

2016-05-18 Thread Suneel Marthi
Verified {src} * {tar, zip}

Ran a clean build and tests and see no issues

+1 (binding)

On Wed, May 18, 2016 at 6:07 PM, Suneel Marthi <smar...@apache.org> wrote:

> This is the vote for release 0.12.1 of Apache Mahout.
>
> The vote will be going for at least 72 hours and will be closed on
> Wednesday,
> May 21th, 2016.  Please download, test and vote with
>
> [ ] +1, accept RC as the official 0.12.1 release of Apache Mahout
> [ ] +0, I don't care either way,
> [ ] -1, do not accept RC as the official 0.12.1 release of Apache Mahout,
> because...
>
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachemahout-1023
>
> The git tag to be voted upon is release-0.12.1
>


[VOTE] Apache Mahout 0.12.1 Release

2016-05-18 Thread Suneel Marthi
This is the vote for release 0.12.1 of Apache Mahout.

The vote will be going for at least 72 hours and will be closed on
Wednesday,
May 21th, 2016.  Please download, test and vote with

[ ] +1, accept RC as the official 0.12.1 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.12.1 release of Apache Mahout,
because...


Maven staging repo:
https://repository.apache.org/content/repositories/orgapachemahout-1023

The git tag to be voted upon is release-0.12.1


Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
That's correct, deprecated as of Feb 2014 and will be completely purged in
one of the upcoming releases (0.13.0)

On Thu, Apr 28, 2016 at 2:10 PM, Dmitriy Lyubimov  wrote:

> Prakash,
>
> if you are using any Mahout Mapreduce algorithm for research, please make
> sure to make this disclosure:
>
> all Mahout MapReduce algorithms are officially not supported and deprecated
> since February, 2014 (IIRC). I can dig up a specific issue regarding this.
> There also has been an announcement.
>
> So before you really start drawing any comparisons, please be advised that
> you are starting with algoritms 2+ years even since their EOL (let alone
> inception).
>
> Thanks.
> -D
>
> On Thu, Apr 28, 2016 at 11:05 AM, Prakash Poudyal <
> prakashpoud...@gmail.com>
> wrote:
>
> > Hi! Ted,
> >
> > You mean Mahout is no more supporting "fuzzy K clustering for the
> > sentences". Can you clarify in more detail . :(
> >
> > Prakash
> >
> > On Thu, Apr 28, 2016 at 6:58 PM, Ted Dunning 
> > wrote:
> >
> > > On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <
> > > prakashpoud...@gmail.com>
> > > wrote:
> > >
> > > > Actually, I need to use fuzzy clustering to cluster the sentence in
> my
> > > > research. I found  fuzzy k clustering algorithm in Apache Mahout,
> > thus, I
> > > > am trying to use it for my purpose.
> > > >
> > >
> > > That's great.
> > >
> > > But that code is no longer supported.
> > >
> >
> >
> >
> > --
> >
> > Regards
> > Prakash Poudyal
> >
>


Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
Yes, the entire MapReduce code (which includes the fuzzy clustering that u
r looking at) is not supported anymore as of Mahout 0.10.0 (suggest reading
the release notes on mahout.apache.org)


On Thu, Apr 28, 2016 at 2:05 PM, Prakash Poudyal 
wrote:

> Hi! Ted,
>
> You mean Mahout is no more supporting "fuzzy K clustering for the
> sentences". Can you clarify in more detail . :(
>
> Prakash
>
> On Thu, Apr 28, 2016 at 6:58 PM, Ted Dunning 
> wrote:
>
> > On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <
> > prakashpoud...@gmail.com>
> > wrote:
> >
> > > Actually, I need to use fuzzy clustering to cluster the sentence in my
> > > research. I found  fuzzy k clustering algorithm in Apache Mahout,
> thus, I
> > > am trying to use it for my purpose.
> > >
> >
> > That's great.
> >
> > But that code is no longer supported.
> >
>
>
>
> --
>
> Regards
> Prakash Poudyal
>


Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
On Thu, Apr 28, 2016 at 1:54 PM, Prakash Poudyal <prakashpoud...@gmail.com>
wrote:

> Dear Suneel,
>
> Thank you so much for your reply, I was waiting for long time.
>
> Actually, I need to use fuzzy clustering to cluster the sentence in my
> research. I found  fuzzy k clustering algorithm in Apache Mahout, thus, I
> am trying to use it for my purpose.
>
> Regarding your reply, of "first thing" if I cannot see the answer what I am
> doing, than I may be in wrong direction. Can tell me, give some guideline
> to the requirement as I mention above.
>

What I meant to convey was - u have not been seeing responses to ur
question since this is all legacy MR code that's not supported anymore.

>
> Next, about -c centroids, you we get the -c centroids after we execute the
> Clustering Driver only. If you know the some helpful link, can you share.
>

I suggest u look at the code as opposed to just reading someone's blog
instructions.  It should give u a better understanding of the
implementation details.

In the CLI that u r running, the -c is a folder for the generated
centroids. I suggest you look at the code to see how that's being done.

feel free to pose more questions.


> Thank you so much. I was being stuck since last two days. Hope you will
> reply me sooner.
>
> Prakash
>
>
> On Thu, Apr 28, 2016 at 6:26 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > First thing, most of this code is legacy MapReduce and is not supported
> > anymore. Hence you r not seeing answers.
> >
> > Back to ur question: -c specifies the folder for the initial centroids
> that
> > r randomly generated.  IIR, the centroids are generated when u execute
> the
> > Clustering Driver.
> >
> >
> > On Wed, Apr 27, 2016 at 1:57 PM, Prakash Poudyal <
> prakashpoud...@gmail.com
> > >
> > wrote:
> >
> > > Hi!
> > >
> > > I am using fuzzy clustering, but I could not understand "  -c
> > > reuters-fkmeans-centroids  ". How to calculate this ?
> > >
> > >
> > > $ /bin/mahout fkmeans -i reuters-vectors/tfidf-vectors/ -c
> > > reuters-fkmeans-centroids -o reuters-fkmeans-clusters -cd 1.0 -k 21 -m
> 2
> > > -ow -x 10 -dm
> > > org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
> > >
> > > --
> > >
> > > Regards
> > > Prakash Poudyal
> > >
> >
>
>
>
> --
>
> Regards
> Prakash Poudyal
>


Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
First thing, most of this code is legacy MapReduce and is not supported
anymore. Hence you r not seeing answers.

Back to ur question: -c specifies the folder for the initial centroids that
r randomly generated.  IIR, the centroids are generated when u execute the
Clustering Driver.


On Wed, Apr 27, 2016 at 1:57 PM, Prakash Poudyal 
wrote:

> Hi!
>
> I am using fuzzy clustering, but I could not understand "  -c
> reuters-fkmeans-centroids  ". How to calculate this ?
>
>
> $ /bin/mahout fkmeans -i reuters-vectors/tfidf-vectors/ -c
> reuters-fkmeans-centroids -o reuters-fkmeans-clusters -cd 1.0 -k 21 -m 2
> -ow -x 10 -dm
> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>
> --
>
> Regards
> Prakash Poudyal
>


Re: Mahout with Hadoop 2.2.0

2016-04-25 Thread Suneel Marthi
On Mon, Apr 25, 2016 at 9:16 AM, Nantia Makrynioti 
wrote:

> Thank for your replies!
>
> Mahout 0.12.0 from command line worked great.
>
> However, if I want to develop the same example with Java API, what version
> of mahout-core should I use in the pom.xml?
>
> I tried mahout-core 0.12.0, but it does not exist. With mahout-core 0.9 I
> got the same exception as in the previous email.
>

It should mahout-mr not mahout-core since Mahout 0.10.0

>
> java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
>
> The exception was thrown during training of Naive Bayes model. I use
> TrainNaiveBayesJob and ToolRunner classes to train the model.
>
>
>
>
>
> 2016-04-25 16:06 GMT+03:00 Alen Xia :
>
> > https://issues.apache.org/jira/browse/MAHOUT-1354
> >
> > Hope that helps :)
> >
> > 发自我的 iPhone
> >
> > > 在 2016年4月25日,18:27,Nantia Makrynioti  写道:
> > >
> > > Hello,
> > >
> > > I have installed Hadoop 2.2.0 on an Ubuntu 14.04 machine. I built
> Mahout
> > > 0.9 with
> > >
> > > mvn clean package -Dhadoop2.version=2.2.0
> > >
> > > but when I run the 20newsgroups classification example, I get the
> > following
> > > exception:
> > >
> > > java.lang.IncompatibleClassChangeError: Found interface
> > > org.apache.hadoop.mapreduce.JobContext, but class was expected
> > >
> > > How can I run Mahout with Hadoop 2.2.0?
> > >
> > > Thank you in advance,
> > > Nantia
> >
>


Re: Mahout with Hadoop 2.2.0

2016-04-25 Thread Suneel Marthi
Mahout 0.9 is not supported anymore and u shouldn't be using it.

The issue u r seeing is due to Mahout 0.9 not being compatible with hadoop
2.x.

Suggest you try this example with Mahout 0.11.0 or 0.12.0.



On Mon, Apr 25, 2016 at 6:27 AM, Nantia Makrynioti 
wrote:

> Hello,
>
> I have installed Hadoop 2.2.0 on an Ubuntu 14.04 machine. I built Mahout
> 0.9 with
>
> mvn clean package -Dhadoop2.version=2.2.0
>
> but when I run the 20newsgroups classification example, I get the following
> exception:
>
> java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
>
> How can I run Mahout with Hadoop 2.2.0?
>
> Thank you in advance,
> Nantia
>


Congratulations to our new Chair

2016-04-20 Thread Suneel Marthi
Please join me in congratulating Andrew Palumbo on becoming our new Project
Chair.

As for me, it was a pleasure to serve as Chair starting with the Mahout
0.10.0 release and ending with the recent 0.12.0 release, and perhaps we
will do it again someday.

​Congrats again, Andy!​


[ANNOUNCE] Apache Mahout 0.12.0 Release

2016-04-11 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.12.0.

Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs on Spark, Flink
and H2O.

The Mahout 0.12.0 release marks a major milestone for the “Samsara”
environment’s goal of providing an engine neutral math platform by now
supporting Flink.  While still experimental, the mahout Flink bindings now
offer all of the R-Like semantics for linear algebra operations, matrix
decompositions, and algorithms of the “Samsara” platform for execution on a
Flink back-end.

This gives users of Flink out of the box access to the following features
(and more):


   1.

   The Mahout Distributed Row Matrix (DRM) API.
   2.

   Distributed and local Vector and Matrix algebra routines.
   3.

   Distributed and local Stochastic Principal Component Analysis.
   4.

   Distributed and local Stochastic Singular Value Decomposition.
   5.

   Distributed and local Thin QR Decomposition.
   6.

   Collaborative Filtering.
   7.

   Naive Bayes Classification.
   8.

   Matrix operations (only listing a few here):
   1.

  Mahout-native blockified distributed Matrix map and allreduce
  routines.
  2.

  Distributed data point (row) sampling.
  3.

  Matrix/Matrix Squared Distance.
  4.

  Element-wise log.
  5.

  Element-wise roots.
  6.

  Element-wise Matrix/Matrix addition, subtraction, division and
  multiplication.
  7.

  Functional Matrix value assignment.
  9.

   A familiar Scala-based R-like DSL.


As well as tools to develop other mathematical and machine learning
algorithms.

To get started with Apache Mahout 0.12.0, download the release artifacts
and signatures from 
http://www.apache.org/dist/mahout/0.12.0/.

Many thanks to the contributors and committers who were part of this
release. Thanks in particular to Till Rohrmann, Alexey Grigorev, Robert
Metzger, Stephan Ewen, and Kostas Tzoumas, members of Data Artisans and the
Flink community who helped in this effort significantly. Please see below
for the Release Highlights.

RELEASE HIGHLIGHTS

This is a major release over Mahout 0.11.2 meant to introduce Apache Flink (
http://flink.apache.org) as a backend execution engine to the Samsara
Linear Algebra framework.

For more information about “Samsara” on Flink see: (
http://mahout.apache.org/users/flinkbindings/flink-internals.html) and (
http://mahout.apache.org/users/flinkbindings/playing-with-samsara-flink.html
)

Mahout 0.12.0 is based on Apache Flink 1.0.1 (
http://flink.apache.org/news/2016/04/06/release-1.0.1.html
)

 Mahout 0.12.0 now supports Flink 1.0.1 and Spark 1.5.2 on Hadoop 2.4.1.

KNOWN ISSUES

   1.

   Mahout’s DRM checkpointing is not fully supported in this release and
   the DrmLike.checkpoint(CacheHint.CacheHint) contract is broken.  Currently
   checkpoints are cached to a temporary file system as designated by the
   `taskmanager.tmp.dirs` property in the
   `$MAHOUT_HOME/conf/flink-config.yaml` file.  This Issue affects the
   performance of Mahout on Flink.
   2.

   Serialization issues have arisen with certain operations. As the Flink
   Bindings are still experimental, we’ve allowed these issues to pass the
   release, and will be addressing them in a follow up 0.12.1 maintenance
   release.  These issues affect the performance of Mahout on Flink.
   3.

   Highly iterative Mahout algorithms are currently significantly slowed by
   issue (1).



Fixed Jiras:

This release addresses 35 issues [1] of which 14 are bug fixes [2].

Future Roadmap:

1. Mahout 0.12.1 will support a Flink shell.

2. Several optimizations will be made to the Mahout Flink-Bindings in
Mahout 0.12.1, specifically to overcome the performance issues noted in the
Known Issues section above.

3. We will be exploring native Mahout caching for Flink.

4. Explore leveraging ViennaCL (
http://viennacl.sourceforge.net/doc/manual-license.html) as a math backend
to support Dense, sparse and Cuda computations on bare metal.



[1]

Re: [VOTE] Apache Mahout 0.12.0 Release Candidate

2016-04-11 Thread Suneel Marthi
Thanks all. We have 3 +1 (binding) votes and no minuses. This release has
passed and the vote is officially closed.
I'll send an announcement out when the release is finalized.



On Mon, Apr 11, 2016 at 2:17 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Checked sigs and hashes, spot-checked examples, and exercised the spark
> shell.
>
> +1 (binding)
>
> On Mon, Apr 11, 2016 at 11:08 AM, Andrew Palumbo <ap@outlook.com>
> wrote:
>
> > ran through mapreduce/spark examples from the binary .tar.gz distro. ran
> > the spark-shell and tested the spark-document-classifier.mscala script.
> > +1
> >
> > 
> > From: Andrew Musselman <andrew.mussel...@gmail.com>
> > Sent: Monday, April 11, 2016 12:43 PM
> > To: d...@mahout.apache.org
> > Cc: user@mahout.apache.org
> > Subject: Re: [VOTE] Apache Mahout 0.12.0 Release Candidate
> >
> > Sigs and hashes are correct, running a build and examples next.
> >
> > On Mon, Apr 11, 2016 at 8:38 AM, Suneel Marthi <smar...@apache.org>
> wrote:
> >
> > > Ran a complete build on  {src} * {zip, tar} and verified that all tests
> > > pass.
> > >
> > > Tested Spark Shell
> > >
> > > All Flink tests pass
> > >
> > > +1 (binding)
> > >
> > > On Mon, Apr 11, 2016 at 8:44 AM, Suneel Marthi <smar...@apache.org>
> > wrote:
> > >
> > > > Correction to previous message
> > > > --
> > > >
> > > > This is a vote for release 0.12.0 of Apache Mahout that adds Apache
> > Flink
> > > > as an execution engine to the Samsara Linear Algebra framework.
> > > >
> > > > The vote will run for 24 hours and will be closed on Tuesday,
> > > > April 12th, 2016.  Please download, test and vote with
> > > >
> > > > [ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
> > > > [ ] +0, I don't care either way,
> > > > [ ] -1, do not accept RC as the official 0.12.0 release of Apache
> > Mahout,
> > > > because...
> > > >
> > > >
> > > > Maven staging repo:
> > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1022/
> > > >
> > > > The git tag to be voted upon is mahout-0.12.0
> > > >
> > > > On Mon, Apr 11, 2016 at 8:41 AM, Suneel Marthi <smar...@apache.org>
> > > wrote:
> > > >
> > > >> This is a vote for release 0.12.0 of Apache Mahout that adds Apache
> > > Flink
> > > >> as an execution engine to the Samsara Linear Algebra framework.
> > > >>
> > > >> The vote will run for 24 hours and will be closed on Monday,
> > > >> April 12th, 2016.  Please download, test and vote with
> > > >>
> > > >> [ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
> > > >> [ ] +0, I don't care either way,
> > > >> [ ] -1, do not accept RC as the official 0.12.0 release of Apache
> > > Mahout,
> > > >> because...
> > > >>
> > > >>
> > > >> Maven staging repo:
> > > >>
> > > >>
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1022/
> > > >>
> > > >> The git tag to be voted upon is mahout-0.12.0
> > > >>
> > > >
> > > >
> > >
> >
>


Re: [VOTE] Apache Mahout 0.12.0 Release Candidate

2016-04-11 Thread Suneel Marthi
Ran a complete build on  {src} * {zip, tar} and verified that all tests
pass.

Tested Spark Shell

All Flink tests pass

+1 (binding)

On Mon, Apr 11, 2016 at 8:44 AM, Suneel Marthi <smar...@apache.org> wrote:

> Correction to previous message
> --
>
> This is a vote for release 0.12.0 of Apache Mahout that adds Apache Flink
> as an execution engine to the Samsara Linear Algebra framework.
>
> The vote will run for 24 hours and will be closed on Tuesday,
> April 12th, 2016.  Please download, test and vote with
>
> [ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
> [ ] +0, I don't care either way,
> [ ] -1, do not accept RC as the official 0.12.0 release of Apache Mahout,
> because...
>
>
> Maven staging repo:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1022/
>
> The git tag to be voted upon is mahout-0.12.0
>
> On Mon, Apr 11, 2016 at 8:41 AM, Suneel Marthi <smar...@apache.org> wrote:
>
>> This is a vote for release 0.12.0 of Apache Mahout that adds Apache Flink
>> as an execution engine to the Samsara Linear Algebra framework.
>>
>> The vote will run for 24 hours and will be closed on Monday,
>> April 12th, 2016.  Please download, test and vote with
>>
>> [ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
>> [ ] +0, I don't care either way,
>> [ ] -1, do not accept RC as the official 0.12.0 release of Apache Mahout,
>> because...
>>
>>
>> Maven staging repo:
>>
>> https://repository.apache.org/content/repositories/orgapachemahout-1022/
>>
>> The git tag to be voted upon is mahout-0.12.0
>>
>
>


Re: [VOTE] Apache Mahout 0.12.0 Release Candidate

2016-04-11 Thread Suneel Marthi
Correction to previous message
--

This is a vote for release 0.12.0 of Apache Mahout that adds Apache Flink
as an execution engine to the Samsara Linear Algebra framework.

The vote will run for 24 hours and will be closed on Tuesday,
April 12th, 2016.  Please download, test and vote with

[ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.12.0 release of Apache Mahout,
because...


Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1022/

The git tag to be voted upon is mahout-0.12.0

On Mon, Apr 11, 2016 at 8:41 AM, Suneel Marthi <smar...@apache.org> wrote:

> This is a vote for release 0.12.0 of Apache Mahout that adds Apache Flink
> as an execution engine to the Samsara Linear Algebra framework.
>
> The vote will run for 24 hours and will be closed on Monday,
> April 12th, 2016.  Please download, test and vote with
>
> [ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
> [ ] +0, I don't care either way,
> [ ] -1, do not accept RC as the official 0.12.0 release of Apache Mahout,
> because...
>
>
> Maven staging repo:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1022/
>
> The git tag to be voted upon is mahout-0.12.0
>


[VOTE] Apache Mahout 0.12.0 Release Candidate

2016-04-11 Thread Suneel Marthi
This is a vote for release 0.12.0 of Apache Mahout that adds Apache Flink
as an execution engine to the Samsara Linear Algebra framework.

The vote will run for 24 hours and will be closed on Monday,
April 12th, 2016.  Please download, test and vote with

[ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.12.0 release of Apache Mahout,
because...


Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1022/

The git tag to be voted upon is mahout-0.12.0


Re: [VOTE] Apache Mahout 0.12.0 Release Candidate

2016-04-10 Thread Suneel Marthi
Rolling back the Release Candidate, will put up a new RC in an hour.

On Mon, Apr 11, 2016 at 1:15 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> -1 Problem found during testing.
>
> On Sun, Apr 10, 2016 at 7:29 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > This is the vote for release 0.12.0 of Apache Mahout that adds Apache
> Flink
> > as a execution engine to the Samsara Linear Algebra framework.
> >
> > The vote will run for 24 hours and will be closed on Monday,
> > April 12th, 2016.  Please download, test and vote with
> >
> > [ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
> > [ ] +0, I don't care either way,
> > [ ] -1, do not accept RC as the official 0.12.0 release of Apache Mahout,
> > because...
> >
> >
> > Maven staging repo:
> >
> > https://repository.apache.org/content/repositories/orgapachemahout-1021/
> >
> > The git tag to be voted upon is mahout-012.0
> >
>


[VOTE] Apache Mahout 0.12.0 Release Candidate

2016-04-10 Thread Suneel Marthi
This is the vote for release 0.12.0 of Apache Mahout that adds Apache Flink
as a execution engine to the Samsara Linear Algebra framework.

The vote will run for 24 hours and will be closed on Monday,
April 12th, 2016.  Please download, test and vote with

[ ] +1, accept RC as the official 0.12.0 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.12.0 release of Apache Mahout,
because...


Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1021/

The git tag to be voted upon is mahout-012.0


Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Suneel Marthi
Some background on this issue:

1.  Now that we support Spark and H2O as back ends since 0.10.0 and Flink
coming soon in 0.12.0, its been bloating the size of our release artifacts
when pushing releases to Apache mirrors. Hence we were looking at pruning
some of the components that have not been used or have been long marked
deprecated and are not being worked on.

2.  Since Mahout 0.7 release in June 2012, the project has diverged from
the MiA book even for legacy MapReduce.  Not sure if that's indeed helping
onboard new users.

3.  Seems like the consensus so far based on the user responses is to
retain the MAHOUT_LOCAL the option, thanks all for your responses.


On Mon, Mar 21, 2016 at 11:38 AM, scott cote <scottcc...@gmail.com> wrote:

> one more comment - I understand that it only works for the legacy code.
> Kill it when the legacy code is no longer deprecated, but gone ….
>
> Otherwise - you will shut out people who buy the older mahout books (such
> as MIA) which are still good reads, even though the tech is dated.
>
> SCott
>
> > On Mar 21, 2016, at 2:24 AM, David Starina <david.star...@gmail.com>
> wrote:
> >
> > Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
> > MapReduce-based code still makes sense if it is running well on Ignite.
> >
> > On Mon, Mar 21, 2016 at 8:20 AM, David Starina <david.star...@gmail.com>
> > wrote:
> >
> >> Has anyone tried to run the deprecated MapReduce code on Ignite? Is the
> >> performance improvement good enough to reconsider leaving those
> algorithms
> >> in Mahout?
> >>
> >> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> >> andrew.mussel...@gmail.com> wrote:
> >>
> >>> Yes I agree; will leave the question open a couple days.
> >>>
> >>> On Sunday, March 20, 2016, Pat Ferrel <p...@occamsmachete.com> wrote:
> >>>
> >>>> Maybe a better user question is: How many people are still using the
> >>>> deprecated Hadoop code?
> >>>>
> >>>> If the number is small +1 for removal.
> >>>>
> >>>> On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
> >>> andrew.mussel...@gmail.com
> >>>> <javascript:;>> wrote:
> >>>>
> >>>> To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
> >>>> MapReduce-based jobs which officially became deprecated in 0.10.0.
> >>>>
> >>>> On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
> >>>> andrew.mussel...@gmail.com <javascript:;>> wrote:
> >>>>
> >>>>> Yes as I understand it.
> >>>>>
> >>>>>
> >>>>> On Sunday, March 20, 2016, Pat Ferrel <p...@occamsmachete.com
> >>>> <javascript:;>> wrote:
> >>>>>
> >>>>>> Are we just talking about Hadoop Mapreduce? I thought is was ignored
> >>>> when
> >>>>>> using Spark.
> >>>>>>
> >>>>>> On Mar 20, 2016, at 8:20 AM, alok tanna <tannaa...@gmail.com
> >>>> <javascript:;>> wrote:
> >>>>>>
> >>>>>> -1 MAHOUT_LOCAL  is very useful for quick POC .
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Alok Tanna
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu <
> mihai.dasc...@cs.pub.ro
> >>>> <javascript:;>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> -1 I still use it for fast deployment and it’s really helpful for
> >>> small
> >>>>>> local processing
> >>>>>>>
> >>>>>>> Have a great weekend!
> >>>>>>> Mihai
> >>>>>>>
> >>>>>>>> On 20 Mar 2016, at 06:13, Suneel Marthi <suneel.mar...@gmail.com
> >>>> <javascript:;>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> +1 to remove this
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> >>>>>> andrew.mussel...@gmail.com <javascript:;>> wrote:
> >>>>>>>>>
> >>>>>>>>> We're discussing removing the MAHOUT_LOCAL option in order to
> trim
> >>>>>> artifact
> >>>>>>>>> sizes.
> >>>>>>>>>
> >>>>>>>>> If you think keeping the option to use MAHOUT_LOCAL for testing
> >>> with
> >>>>>> the
> >>>>>>>>> single-node mode of Hadoop is important please let us know. It
> >>> can be
> >>>>>> handy
> >>>>>>>>> for trying things out but it would be nice to ditch the effort
> >>>>>> required to
> >>>>>>>>> maintain it.
> >>>>>>>>>
> >>>>>>>>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
> >>>>>> context.
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>


Re: Removing MAHOUT_LOCAL option

2016-03-19 Thread Suneel Marthi
+1 to remove this

Sent from my iPhone

> On Mar 20, 2016, at 12:01 AM, Andrew Musselman  
> wrote:
> 
> We're discussing removing the MAHOUT_LOCAL option in order to trim artifact
> sizes.
> 
> If you think keeping the option to use MAHOUT_LOCAL for testing with the
> single-node mode of Hadoop is important please let us know. It can be handy
> for trying things out but it would be nice to ditch the effort required to
> maintain it.
> 
> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more context.
> 
> Thanks!


[ANNOUNCE] Apache Mahout 0.11.2 Release

2016-03-12 Thread Suneel Marthi
Apache Mahout 0.11.2 Release Notes

The Apache Mahout PMC is pleased to announce the release of Mahout 0.11.2.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs on Spark and
H2O.

To get started with Apache Mahout 0.11.2, download the release artifacts
and signatures from http://www.apache.org/dist/mahout/0.11.2/.

Many thanks to the contributors and committers who were part of this
release. Please see below for the Release Highlights.

RELEASE HIGHLIGHTS

This is a minor release over Mahout 0.11.0 meant to introduce major
performance enhancements with sparse matrix and vector computations, and
major performance optimizations to the Samsara DSL.  Mahout 0.11.2 includes
all new features and bug fixes released in Mahout versions 0.11.0 and
0.11.1.

Mahout 0.11.2 new features compared to Mahout 0.11.1.



   1.

   Spark 1.5.2 support.
   2.

   Performance improvements of over 30% on Sparse Vector and Matrix
   computations leveraging the ‘fastutil’ library -  contribution from
   Sebastiano Vigna. This speeds up all in-core sparse vector and matrix
   computations.
   3.


KNOWN ISSUES

The dataset URLs in the Wikipedia Naive Bayes classification example script
(/examples/bin/classify-wikipedia.sh) have changed.  The new URL for the
smallest set is:

http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p00010p30302.bz2

and for the medium set:

http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles10.xml-p002336425p003046511.bz2

To run the Wikipedia classification example, simply switch out the old URLs
with the new in classify-wikipedia.sh.

Fixed Jiras:

MAHOUT-1640: Better collections would significantly improve
vector-operation speed

MAHOUT-1800: Pare down Classtag overuse

MAHOUT-1801: FastUtil to improve speed of Sparse Matrix Operations

MAHOUT-1802: Capture attached checkpoints (if cached)


Future Roadmap:

1. Mahout 0.12.0 will be released soon and would have Apache Flink as a
supported backend execution engine.

2. Explore leveraging ViennaCL (
http://viennacl.sourceforge.net/doc/manual-license.html) as a math backend
to support Dense, sparse and Cuda computations on bare metal.


Re: [VOTE] Apache Mahout 0.11.2 Release Candidate

2016-03-11 Thread Suneel Marthi
Thanks for the votes, we have 3 +1 votes and no issues reported.

This vote is now officially closed, will send an announce once the release
has been finalized.

On Fri, Mar 11, 2016 at 10:17 PM, Andrew Palumbo <ap@outlook.com> wrote:

> Built and tested src tar.  Ran through classification and clustering
> examples in the .zip and .tar binary distro covering Spark single machine,
> MapReduce pseudo-cluster and MAHOUT_LOCAL.  Ran the spark-shell with some
> simple distributed matrix multiplication and I/O tests in single machine
> mode.  Ran the spark-document-classifier.mscala script.  All without issue
> (aside from the URL changes mentioned in the release notes).
>
> +1
>
> ________
> From: Suneel Marthi <smar...@apache.org>
> Sent: Friday, March 11, 2016 6:03 PM
> To: user@mahout.apache.org; mahout
> Subject: Re: [VOTE] Apache Mahout 0.11.2 Release Candidate
>
> Checked {src} * {zip, tar}, ran a clean build and all tests pass.
>
> +1
>
> On Fri, Mar 11, 2016 at 5:17 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > This is the vote for release 0.11.2 of Apache Mahout.
> >
> > The vote will be going for 24 hours and will be closed on Sunday,
> > March 12th, 2016.  Please download, test and vote with
> >
> > [ ] +1, accept RC as the official 0.11.2 release of Apache Mahout
> > [ ] +0, I don't care either way,
> > [ ] -1, do not accept RC as the official 0.11.2 release of Apache Mahout,
> > because...
> >
> >
> > Maven staging repo:
> >
> > https://repository.apache.org/content/repositories/orgapachemahout-1019/
> >
> > The git tag to be voted upon is release-0.11.2
> >
>


[VOTE] Apache Mahout 0.11.2 Release Candidate

2016-03-11 Thread Suneel Marthi
This is the vote for release 0.11.2 of Apache Mahout.

The vote will be going for 24 hours and will be closed on Sunday,
March 12th, 2016.  Please download, test and vote with

[ ] +1, accept RC as the official 0.11.2 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.11.2 release of Apache Mahout,
because...


Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1019/

The git tag to be voted upon is release-0.11.2


Re: New Mahout "Samsara" Book

2016-02-25 Thread Suneel Marthi
I have talked with @Pat offline about a book on the new Recommenders.

Definitely calls for another book.

On Thu, Feb 25, 2016 at 1:22 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> The project has moved in various ways since MiA was first published, but
> just covering Samsara leaves a lot of recommendation code that needs to be
> covered. There is room for another book.
>
>
>
>
>
> On Thu, Feb 25, 2016 at 9:32 AM, Suneel Marthi <smar...@apache.org> wrote:
>
> > The Mahout project has diverged from 'Mahout in Action' since Mahout 0.7
> > release in 2012.
> >
> >
> > On Thu, Feb 25, 2016 at 12:30 PM, FRANCISCO XAVIER SUMBA TORAL <
> > xavier.sumb...@ucuenca.ec> wrote:
> >
> > > Awesome!! This book it’s mostly about Samsara. I’ll buy it. BTW do you
> > > know if there is an update of Mahout in Action? or Is it worth to buy
> > now?
> > > It has not changed too much mahout or the approach its same with the
> > actual
> > > implementation of mahout.
> > >
> > >
> >
>


Re: New Mahout "Samsara" Book

2016-02-25 Thread Suneel Marthi
The Mahout project has diverged from 'Mahout in Action' since Mahout 0.7
release in 2012.


On Thu, Feb 25, 2016 at 12:30 PM, FRANCISCO XAVIER SUMBA TORAL <
xavier.sumb...@ucuenca.ec> wrote:

> Awesome!! This book it’s mostly about Samsara. I’ll buy it. BTW do you
> know if there is an update of Mahout in Action? or Is it worth to buy now?
> It has not changed too much mahout or the approach its same with the actual
> implementation of mahout.
>
>


Re: New Mahout "Samsara" Book

2016-02-25 Thread Suneel Marthi
Thanks Scott for the invite.

@apalumbo @Dmitriy @PatFerrel  ???

On Thu, Feb 25, 2016 at 10:19 AM, scott cote <scottcc...@gmail.com> wrote:

> Suneel and others:
>
> Anyone of ya’ll want to come to DFW Data Science sometime this summer and
> give a talk?  You can promote the book.
> You would be following on the heals of a couple of talks regarding deep
> learning and search engines.
>
> Here is the url for the user group:
>
> http://www.meetup.com/DFW-Data-Science/ <
> http://www.meetup.com/DFW-Data-Science/>
>
> Our events are always on the first Monday of every month.  We have almost
> 600 members with average attendance somewhere north of 50 per event (High
> of 110 and low of 25).
>
> Cheers,
>
> SCott
>
> > On Feb 25, 2016, at 8:56 AM, Suneel Marthi <smar...@apache.org> wrote:
> >
> > You can see the TOC on Amazon
> >
> >
> http://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785
> >
> >
> > On Thu, Feb 25, 2016 at 9:55 AM, Pavan K Narayanan <
> > pavan.naraya...@gmail.com> wrote:
> >
> >> Andrew, can you please attach table of contents if you don't mind.
> >> On Feb 25, 2016 8:05 AM, "Andrew Palumbo" <ap@outlook.com> wrote:
> >>
> >>> The new book, "Apache Mahout: Beyond MapReduce" has been released.
> >> Written
> >>> by Mahout committers Dmitriy Lyubimov and Andrew Palumbo, this book
> >> covers
> >>> previously undocumented features of Mahout releases 0.10 and 0.11.
> >>> For more information please see the announcement page:
> >>>
> >>>
> >>
> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
> >>> Thank You
> >>>
> >>>
> >>
>
>


Re: New Mahout "Samsara" Book

2016-02-25 Thread Suneel Marthi
It does give u TOC when u 'Look Inside'.

On Thu, Feb 25, 2016 at 10:16 AM, Pavan K Narayanan <
pavan.naraya...@gmail.com> wrote:

> I checked both links, they have only front and back cover of the book. No
> table of contents
> On Feb 25, 2016 9:57 AM, "Suneel Marthi" <smar...@apache.org> wrote:
>
>> You can see the TOC on Amazon
>>
>>
>> http://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785
>>
>>
>> On Thu, Feb 25, 2016 at 9:55 AM, Pavan K Narayanan <
>> pavan.naraya...@gmail.com> wrote:
>>
>> > Andrew, can you please attach table of contents if you don't mind.
>> > On Feb 25, 2016 8:05 AM, "Andrew Palumbo" <ap@outlook.com> wrote:
>> >
>> > > The new book, "Apache Mahout: Beyond MapReduce" has been released.
>> > Written
>> > > by Mahout committers Dmitriy Lyubimov and Andrew Palumbo, this book
>> > covers
>> > > previously undocumented features of Mahout releases 0.10 and 0.11.
>> > > For more information please see the announcement page:
>> > >
>> > >
>> >
>> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
>> > > Thank You
>> > >
>> > >
>> >
>>
>


Re: New Mahout "Samsara" Book

2016-02-25 Thread Suneel Marthi
You can see the TOC on Amazon

http://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785


On Thu, Feb 25, 2016 at 9:55 AM, Pavan K Narayanan <
pavan.naraya...@gmail.com> wrote:

> Andrew, can you please attach table of contents if you don't mind.
> On Feb 25, 2016 8:05 AM, "Andrew Palumbo"  wrote:
>
> > The new book, "Apache Mahout: Beyond MapReduce" has been released.
> Written
> > by Mahout committers Dmitriy Lyubimov and Andrew Palumbo, this book
> covers
> > previously undocumented features of Mahout releases 0.10 and 0.11.
> > For more information please see the announcement page:
> >
> >
> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
> > Thank You
> >
> >
>


Re: Confusion regarding Samsara's configuration

2016-02-02 Thread Suneel Marthi
Are u working off of Mahout 0.11.1 ? 0.11.1 has been certified for Spark
1.5 but compatible with 1.6.


On Tue, Feb 2, 2016 at 12:10 PM, BahaaEddin AlAila 
wrote:

> Thank you very much for your reply.
> As I mentioned earlier, I am using mahoutSparkContext, and MAHOUT_HOME is
> set to the correct mahout path.
> I also have tried setting up the context myself as I looked into the
> implementation of mahoutSparkContext and supplied the jars path manually.
> still the same error.
> I will try with spark 1.5 and report.
>
> Thank you very much again,
>
> Kind Regards,
> Bahaa
>
>
> On Tue, Feb 2, 2016 at 12:01 PM, Dmitriy Lyubimov 
> wrote:
>
> > Bahaa, first off, i don't think we have certified any of releases to run
> > with spar 1.6 (yet). I think spark 1.5 is the last known release to run
> > with 0.11 series.
> >
> > Second, if you use mahoutSparkContext() method to create context, it
> would
> > look for MAHOUT_HOME setup to add mahout binaries to the job. So the
> > reasons you may not getting it is perhaps you are not using the
> > mahoutCreateContext()?
> >
> > alternatively, you can create context yourself, but you need (1) make
> sure
> > it has enabled and configured Kryo serialization properly, and (2) have
> > added all necessary mahout jars on your own.
> >
> > -d
> >
> > On Tue, Feb 2, 2016 at 8:22 AM, BahaaEddin AlAila  >
> > wrote:
> >
> > > Greetings mahout users,
> > >
> > > I have been trying to use mahout samsara as a library with scala/spark,
> > but
> > > I haven't been successful in doing so.
> > >
> > > I am running spark 1.6.0 binaries, didn't build it myself.
> > > However, I tried both readily available binaries on Apache mirrors, and
> > > cloning and compiling mahout's repo, but neither worked.
> > >
> > > I keep getting
> > >
> > > Exception in thread "main" java.lang.NoClassDefFoundError:
> > > org/apache/mahout/sparkbindings/SparkDistributedContext
> > >
> > > The way I am doing things is:
> > > I have spark in ~/spark-1.6
> > > and mahout in ~/mahout
> > > I have set both $SPARK_HOME and $MAHOUT_HOME accordingly, along with
> > > $MAHOUT_LOCAL=true
> > >
> > > and I have:
> > >
> > > ~/app1/build.sbt
> > > ~/app1/src/main/scala/App1.scala
> > >
> > > in build.sbt I have these lines to declare mahout dependecies:
> > >
> > > libraryDependencies += "org.apache.mahout" %% "mahout-math-scala" %
> > > "0.11.1"
> > >
> > > libraryDependencies += "org.apache.mahout" % "mahout-math" % "0.11.1"
> > >
> > > libraryDependencies += "org.apache.mahout" % "mahout-spark_2.10" %
> > "0.11.1"
> > >
> > > along with other spark dependencies
> > >
> > > and in App1.scala, in the main function, I construct a context object
> > using
> > > mahoutSparkContext, and of course, the sparkbindings are imported
> > >
> > > everything compiles successfully
> > >
> > > however, when I submit to spark, I get the above mentioned error.
> > >
> > > I have a general idea of why this is happening: because the compiled
> app1
> > > jar depends on mahout-spark dependency jar but it cannot find it in the
> > > class path upon being submitted to spark.
> > >
> > > In the instructions I couldn't find how to explicitly add the
> > mahout-spark
> > > dependency jar to the class path.
> > >
> > > The question is: Am I doing the configurations correctly or not?
> > >
> > > Sorry for the lengthy email
> > >
> > > Kind Regards,
> > > Bahaa
> > >
> >
>


Re: Some test results

2015-12-30 Thread Suneel Marthi


On Wed, Dec 30, 2015 at 2:57 PM, Dmitriy Lyubimov  wrote:

> Nice!
> On Dec 30, 2015 11:51 AM, "Pat Ferrel"  wrote:
>
> > As many of you know Mahout-Samsara includes an interesting and important
> > extension to cooccurrence similarity, which supports cross-coossurrence
> and
> > log-likelihood downsampling. This, when combined with a search engine,
> > gives us a multimodal recommender. Some of us integrated Mahout with a DB
> > and search engine to create what we call (humbly) the Universal
> Recommender.
> >
> > We just completed a tool that measures the effects of what we call
> > secondary events or indicators using the Universal Recommender. It
> > calculates a ranking based precision metric called mean average
> > precision—MAP@k. We took a dataset from the Rotten Tomatoes web site of
> > “fresh”, and “rotten” reviews and combined that with data about the
> genres,
> > casts, directors, and writers of the various video items. This gave us
> the
> > indicators below:
> > like, video-id <== primary indicator
> > dislike, video-id
> > like-genre, genre-id
> > dislike-genre, genre-id
> > like-director, director-id
> > dislike-director, director-id
> > like-writer, writer-id
> > dislike-writer, writer-id
> > like-cast, cast-member-id
> > dislike-cast, cast-member-id
> > These aren’t necessarily what we would have chosen if we were designing
> > something from scratch but are possible to gather from public data.
> >
> > We have only ~5000 mostly professional reviewers with ~250k video items
> in
> > this dataset but have a larger one we are integrating. We are also
> writing
> > a white paper and blog post with some deeper analysis. There are several
> > tidbits of insight when you look deeper.
> >
> > The bottom line is that using most of the above indicators we were able
> to
> > get a 26% increase in MAP@1 over using only “like”. This is important
> > because the vast majority of recommenders can only really ingest one type
> > of indicator.
> >
> > http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
> <
> > http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html>
> >
> >
> https://github.com/actionml/template-scala-parallel-universal-recommendation
> > <
> >
> https://github.com/actionml/template-scala-parallel-universal-recommendation
> > >
>


Re: Updated books

2015-12-11 Thread Suneel Marthi
All of the below mentioned books are still based on Mahout 0.10 and cover
the old MapReduce algorithms - all of which have been long
deprecated/retired/"to be purged".

There's no book on Mahout out there today that deals with the new Mahout,
its best you bring up ur questions on the email lists and have them
answered here.


On Fri, Dec 11, 2015 at 11:18 PM, Cristian Barrientos Montoya <
cs3...@gmail.com> wrote:

> You could use some like these:
>
>
> https://www.packtpub.com/big-data-and-business-intelligence/learning-apache-mahout-classification
>
> https://www.packtpub.com/big-data-and-business-intelligence/apache-mahout-essentials
>
> https://www.packtpub.com/big-data-and-business-intelligence/apache-mahout-cookbook
>
> 2015-12-11 21:28 GMT-05:00 FRANCISCO XAVIER SUMBA TORAL <
> xavier.sumb...@ucuenca.ec>:
>
> > Hi,
> >
> > I am learning about mahout and process text. So I started reading Mahout
> > in Action [1], but its from 2011 and Taming Text [2] that is from 2012. I
> > am asking for this, because the platform has new features and I would
> like
> > to know if this books still been up to date.
> >
> > I can use the platform to run some simple experiments, but I am working
> in
> > my thesis and I need to know more deeply the platform. So if you know any
> > other book, papers, workshops or tutorials I will be so grateful.
> >
> > Cheers.
> > Xavier Sumba
> >
> > [1] https://www.manning.com/books/mahout-in-action <
> > https://www.manning.com/books/mahout-in-action>
> > [2] https://www.manning.com/books/taming-text <
> > https://www.manning.com/books/taming-text>
> >
> >
>


Re: Mahout item based recommender help documentation

2015-12-02 Thread Suneel Marthi
Mahout 0.9 isn't supported anymore, suggest that you upgrade to Mahout
0.11.0 which is Spark 1.3+ compatible.



On Wed, Dec 2, 2015 at 7:22 PM, Weiqing Jin  wrote:

> Hi, I am new to Mahout. I am using Mahout on Cloudera CDH5.3. I believe it
> has version 0.9.Wondering how can I get help documentation. Specifically I
> am trying to use item based recommender algorithm(as below). I downloaded
> the Mahout 0.9 distribution files, but not able to find help
> specifically on below function, such as what does parameter mean for
> --numRecommendations etc.Am I missing some step here? Thanks.
> mahout recommenditembased [--input
>  --output  --numRecommendations 
> --usersFile  --itemsFile  --filterFile
>--booleanData  --maxPrefsPerUser
>  --minPrefsPerUser 
> --maxSimilaritiesPerItem
> --maxPrefsInItemSimilarity 
> --similarityClassname  --threshold
>  --outputPathForSimilarityMatrix
>  --randomSeed
> --sequencefileOutput --help --tempDir  --startPhase
>  --endPhase
> ]
> --similarityClassname (-s) similarityClassnameName of
> distributed
>   similarity measures class
> toinstantiate,
> alternatively  use one
> of the predefined
>   similarities
>   ([SIMILARITY_COOCCURRENCE,
>   SIMILARITY_LOGLIKELIHOOD,
>   
> SIMILARITY_TANIMOTO_COEFFICIEN
>   T,
> SIMILARITY_CITY_BLOCK,
>   SIMILARITY_COSINE,
>   
> SIMILARITY_PEARSON_CORRELATION
>
> ,
>   
> SIMILARITY_EUCLIDEAN_DISTANCE]
>   )   Eric Jin Retail
> Service Decision Management eric@citi.com 224-222-2590
>
>


Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Suneel Marthi
D, was there a JIRA for this? Have a vague recollection that we may have
addressed a similar thing during summer on one of the branches (most likely
0.10.x). No ?

On Fri, Nov 6, 2015 at 7:05 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> hm. I did not find the staging repo. is it gone already?
>
> One thing, if i may whine (I already asked for it last time):
> Can we please publish -tests artifacts, please pretty please?
>
> it is so much easier if derived applications could re-use mahout testing
> framework.
>
>
> On Fri, Nov 6, 2015 at 2:57 PM, Andrew Musselman <
> andrew.mussel...@gmail.com
> > wrote:
>
> > Checked sigs, built and ran some calculations in spark-shell from tar and
> > zip.
> >
> > +1 binding
> >
> > On Fri, Nov 6, 2015 at 2:41 PM, Andrew Palumbo <ap@outlook.com>
> wrote:
> >
> > > 1. Downloaded and built {src} {tar}- all tests passed.
> > > 2. Started shell from {src} {bin} *{tar} distro and ran some
> distributed
> > > algebra and I/O tests- no problems.
> > > 3. Ran MR Wikipedia example.
> > > 4. Ran Spark CLI naive bayes examples.
> > >
> > > +1 (binding)
> > >
> > >
> > > 
> > > From: Suneel Marthi <smar...@apache.org>
> > > Sent: Friday, November 6, 2015 3:05 PM
> > > To: mahout; user@mahout.apache.org
> > > Subject: Re: [VOTE] Apache Mahout 0.11.1 Release Candidate
> > >
> > > 1. Downloaded {src}* {zip,tar}
> > > 2. Ran a clean build and all tests pass
> > > 3. Spun up Mahout Spark Shell from the compiled artifacts and ran a few
> > > Samsara queries, tests passed
> > > 4. Downloaded {bin} * {zip, tar}
> > > 5. Spun up Mahout Spark Shell from the compiled artifacts and ran a few
> > > Samsara queries, tests passed
> > >
> > > Here's my +1 (binding)
> > >
> > >
> > > On Fri, Nov 6, 2015 at 2:41 PM, Suneel Marthi <suneel.mar...@gmail.com
> >
> > > wrote:
> > >
> > > > Please vote on releasing the following candidate as Apache Mahout
> > version
> > > > 0.11.1:
> > > >
> > > > Branch:
> > > > release-0.11.1
> > > > (see https://git1-us-west.apache.org/repos/asf?p=mahout.git)
> > > >
> > > > The release artifacts to be voted on can be found at:
> > > >
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1018/org/apache/mahout/apache-mahout-distribution/0.11.1/
> > > >
> > > > The release artifacts are signed with the key with fingerprint
> D3541808
> > > > found at:
> > > > http://www.apache.org/dist/mahout/KEYS
> > > >
> > > > The staging repository for this release can be found at:
> > > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1018
> > > >
> > > > -
> > > >
> > > > The vote is open for the next 72 hours and passes if a majority of at
> > > > least
> > > > three +1 PMC votes are cast.
> > > >
> > > > The vote ends on Sunday November 8, 2015.
> > > >
> > > > [ ] +1 Release this package as Apache Mahout 0.11.1
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > > ===
> > > >
> > >
> >
>


Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Suneel Marthi
Thanks. We have 3 +1 votes and no -1s.

This release has passed and the Voting is officially closed, will send an
announcement out when the release has been finalized.

Thanks again.

On Fri, Nov 6, 2015 at 5:57 PM, Andrew Musselman <andrew.mussel...@gmail.com
> wrote:

> Checked sigs, built and ran some calculations in spark-shell from tar and
> zip.
>
> +1 binding
>
> On Fri, Nov 6, 2015 at 2:41 PM, Andrew Palumbo <ap@outlook.com> wrote:
>
> > 1. Downloaded and built {src} {tar}- all tests passed.
> > 2. Started shell from {src} {bin} *{tar} distro and ran some distributed
> > algebra and I/O tests- no problems.
> > 3. Ran MR Wikipedia example.
> > 4. Ran Spark CLI naive bayes examples.
> >
> > +1 (binding)
> >
> >
> > 
> > From: Suneel Marthi <smar...@apache.org>
> > Sent: Friday, November 6, 2015 3:05 PM
> > To: mahout; user@mahout.apache.org
> > Subject: Re: [VOTE] Apache Mahout 0.11.1 Release Candidate
> >
> > 1. Downloaded {src}* {zip,tar}
> > 2. Ran a clean build and all tests pass
> > 3. Spun up Mahout Spark Shell from the compiled artifacts and ran a few
> > Samsara queries, tests passed
> > 4. Downloaded {bin} * {zip, tar}
> > 5. Spun up Mahout Spark Shell from the compiled artifacts and ran a few
> > Samsara queries, tests passed
> >
> > Here's my +1 (binding)
> >
> >
> > On Fri, Nov 6, 2015 at 2:41 PM, Suneel Marthi <suneel.mar...@gmail.com>
> > wrote:
> >
> > > Please vote on releasing the following candidate as Apache Mahout
> version
> > > 0.11.1:
> > >
> > > Branch:
> > > release-0.11.1
> > > (see https://git1-us-west.apache.org/repos/asf?p=mahout.git)
> > >
> > > The release artifacts to be voted on can be found at:
> > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1018/org/apache/mahout/apache-mahout-distribution/0.11.1/
> > >
> > > The release artifacts are signed with the key with fingerprint D3541808
> > > found at:
> > > http://www.apache.org/dist/mahout/KEYS
> > >
> > > The staging repository for this release can be found at:
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1018
> > >
> > > -
> > >
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least
> > > three +1 PMC votes are cast.
> > >
> > > The vote ends on Sunday November 8, 2015.
> > >
> > > [ ] +1 Release this package as Apache Mahout 0.11.1
> > > [ ] -1 Do not release this package because ...
> > >
> > > ===
> > >
> >
>


Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Suneel Marthi
Well lets target that for next release 0.12.0 (or the "ultimate" bucket,
please create a jira.

On Fri, Nov 6, 2015 at 8:46 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> can't remember; bottom line there are no officially published binary test
> artifacts in 0.11 afaict
>
>
> On Fri, Nov 6, 2015 at 5:17 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > D, was there a JIRA for this? Have a vague recollection that we may have
> > addressed a similar thing during summer on one of the branches (most
> likely
> > 0.10.x). No ?
> >
> > On Fri, Nov 6, 2015 at 7:05 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> >
> > > hm. I did not find the staging repo. is it gone already?
> > >
> > > One thing, if i may whine (I already asked for it last time):
> > > Can we please publish -tests artifacts, please pretty please?
> > >
> > > it is so much easier if derived applications could re-use mahout
> testing
> > > framework.
> > >
> > >
> > > On Fri, Nov 6, 2015 at 2:57 PM, Andrew Musselman <
> > > andrew.mussel...@gmail.com
> > > > wrote:
> > >
> > > > Checked sigs, built and ran some calculations in spark-shell from tar
> > and
> > > > zip.
> > > >
> > > > +1 binding
> > > >
> > > > On Fri, Nov 6, 2015 at 2:41 PM, Andrew Palumbo <ap@outlook.com>
> > > wrote:
> > > >
> > > > > 1. Downloaded and built {src} {tar}- all tests passed.
> > > > > 2. Started shell from {src} {bin} *{tar} distro and ran some
> > > distributed
> > > > > algebra and I/O tests- no problems.
> > > > > 3. Ran MR Wikipedia example.
> > > > > 4. Ran Spark CLI naive bayes examples.
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > >
> > > > > 
> > > > > From: Suneel Marthi <smar...@apache.org>
> > > > > Sent: Friday, November 6, 2015 3:05 PM
> > > > > To: mahout; user@mahout.apache.org
> > > > > Subject: Re: [VOTE] Apache Mahout 0.11.1 Release Candidate
> > > > >
> > > > > 1. Downloaded {src}* {zip,tar}
> > > > > 2. Ran a clean build and all tests pass
> > > > > 3. Spun up Mahout Spark Shell from the compiled artifacts and ran a
> > few
> > > > > Samsara queries, tests passed
> > > > > 4. Downloaded {bin} * {zip, tar}
> > > > > 5. Spun up Mahout Spark Shell from the compiled artifacts and ran a
> > few
> > > > > Samsara queries, tests passed
> > > > >
> > > > > Here's my +1 (binding)
> > > > >
> > > > >
> > > > > On Fri, Nov 6, 2015 at 2:41 PM, Suneel Marthi <
> > suneel.mar...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Please vote on releasing the following candidate as Apache Mahout
> > > > version
> > > > > > 0.11.1:
> > > > > >
> > > > > > Branch:
> > > > > > release-0.11.1
> > > > > > (see https://git1-us-west.apache.org/repos/asf?p=mahout.git)
> > > > > >
> > > > > > The release artifacts to be voted on can be found at:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1018/org/apache/mahout/apache-mahout-distribution/0.11.1/
> > > > > >
> > > > > > The release artifacts are signed with the key with fingerprint
> > > D3541808
> > > > > > found at:
> > > > > > http://www.apache.org/dist/mahout/KEYS
> > > > > >
> > > > > > The staging repository for this release can be found at:
> > > > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1018
> > > > > >
> > > > > > -
> > > > > >
> > > > > > The vote is open for the next 72 hours and passes if a majority
> of
> > at
> > > > > > least
> > > > > > three +1 PMC votes are cast.
> > > > > >
> > > > > > The vote ends on Sunday November 8, 2015.
> > > > > >
> > > > > > [ ] +1 Release this package as Apache Mahout 0.11.1
> > > > > > [ ] -1 Do not release this package because ...
> > > > > >
> > > > > > ===
> > > > > >
> > > > >
> > > >
> > >
> >
>


[ANNOUNCE] Apache Mahout 0.11.1 Release

2015-11-06 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.11.1.


Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara (संसारा )” for its symbol of
universal renewal. It reflects a fundamental rethinking of how scalable
machine learning algorithms are built and customized. Mahout-Samsara is
here to help people create their own math while providing some
off-the-shelf algorithm implementations. At its base are general linear
algebra and statistical operations along with the data structures to
support them. It’s written in Scala with Mahout-specific extensions, and
runs on Spark and H2O.

To get started with Apache Mahout 0.11.1, download the release artifacts
and signatures from http://www.apache.org/dist/mahout/0.11.1/.

Many thanks to the contributors and committers who were part of this
release. Please see below for the Release Highlights.


RELEASE HIGHLIGHTS

This is a minor release over Mahout 0.11.0 meant to expand Mahout’s
compatibility with Spark versions, to introduce some new features and to
fix some bugs.  Mahout 0.11.1 includes all new features and bug fixes
released in Mahout versions 0.11.0 and earlier.

Mahout 0.11.1 new features compared to Mahout 0.11.0



   1.

   Spark 1.4+ support.
   2.

   4x Performance improvement in Dot Product over Dense Vectors (
   https://issues.apache.org/jira/browse/MAHOUT-1781)
   3.

   %*% optimization based on matrix flavors.


Note: Mahout 0.11.1 artifacts are binary compatible with Spark 1.4x and
Spark 1.5+.

STATS

A total of 34 separate JIRA issues are addressed in this release [2] with
10 bugfixes.

GETTING STARTED

Download the release artifacts and signatures at
http://www.apache.org/dist/mahout/0.11.1/ The examples directory contains
several working examples of the core functionality available in Mahout.
These can be run via scripts in the examples/bin directory. Most examples
do not need a Hadoop cluster in order to run.

FUTURE PLANS

Integration with Apache Flink is in the works and is targeted for Mahout
Release 0.12.0 in collaboration with TU Berlin and Data Artisans to add
Flink as the 3rd execution engine to Mahout. This would be in addition to
existing Apache Spark and H2O engines.

To see progress on this branch look here:
https://github.com/apache/mahout/commits/flink-binding.

KNOWN ISSUES


   -

   In the binary zip or tar distribution, the example data for
   mahout/examples/bin/run-item-sim is missing. To run it get the csv files
   from Github
   

   [3].
   -

   OOM errors are observed on Mac OS with Java 7 when running trying to run
   the Mahout Spark Shell, it works fine with Java 8.


[1]
https://issues.apache.org/jira/browse/MAHOUT-1787?jql=project%20%3D%20MAHOUT%20AND%20status%20in%20%28Resolved%2C%20closed%29%20AND%20%28fixVersion%20%3D%200.11.1%20%29

[2] http://mahout.apache.org/developers/how-to-contribute.html
[3] https://github.com/apache/mahout/tree/master/examples/src/main/resource
s


Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Suneel Marthi
This Vote is cancelled, a new Release Candidate will be put out sometime
today.

On Fri, Nov 6, 2015 at 1:54 AM, Suneel Marthi <smar...@apache.org> wrote:

> Please vote on releasing the following candidate as Apache Mahout version
> 0.11.1:
>
> Branch:
> release-0.11.1
> (see https://git1-us-west.apache.org/repos/asf?p=mahout.git)
>
> The release artifacts to be voted on can be found at:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1017/org/apache/mahout/apache-mahout-distribution/0.11.1/
>
> The release artifacts are signed with the key with fingerprint D3541808:
> http://www.apache.org/dist/mahout/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachemahout-1017
>
> -
>
> The vote is open for the next 72 hours and passes if a majority of at
> least
> three +1 PMC votes are cast.
>
> The vote ends on Friday November 8, 2015.
>
> [ ] +1 Release this package as Apache Mahout 0.11.1
> [ ] -1 Do not release this package because ...
>
> ===
>


[VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Suneel Marthi
Please vote on releasing the following candidate as Apache Mahout version
0.11.1:

Branch:
release-0.11.1
(see https://git1-us-west.apache.org/repos/asf?p=mahout.git)

The release artifacts to be voted on can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1018/org/apache/mahout/apache-mahout-distribution/0.11.1/

The release artifacts are signed with the key with fingerprint D3541808
found at:
http://www.apache.org/dist/mahout/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1018

-

The vote is open for the next 72 hours and passes if a majority of at least
three +1 PMC votes are cast.

The vote ends on Sunday November 8, 2015.

[ ] +1 Release this package as Apache Mahout 0.11.1
[ ] -1 Do not release this package because ...

===


Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Suneel Marthi
1. Downloaded {src}* {zip,tar}
2. Ran a clean build and all tests pass
3. Spun up Mahout Spark Shell from the compiled artifacts and ran a few
Samsara queries, tests passed
4. Downloaded {bin} * {zip, tar}
5. Spun up Mahout Spark Shell from the compiled artifacts and ran a few
Samsara queries, tests passed

Here's my +1 (binding)


On Fri, Nov 6, 2015 at 2:41 PM, Suneel Marthi <suneel.mar...@gmail.com>
wrote:

> Please vote on releasing the following candidate as Apache Mahout version
> 0.11.1:
>
> Branch:
> release-0.11.1
> (see https://git1-us-west.apache.org/repos/asf?p=mahout.git)
>
> The release artifacts to be voted on can be found at:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1018/org/apache/mahout/apache-mahout-distribution/0.11.1/
>
> The release artifacts are signed with the key with fingerprint D3541808
> found at:
> http://www.apache.org/dist/mahout/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachemahout-1018
>
> -
>
> The vote is open for the next 72 hours and passes if a majority of at
> least
> three +1 PMC votes are cast.
>
> The vote ends on Sunday November 8, 2015.
>
> [ ] +1 Release this package as Apache Mahout 0.11.1
> [ ] -1 Do not release this package because ...
>
> ===
>


[VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-05 Thread Suneel Marthi
Please vote on releasing the following candidate as Apache Mahout version
0.11.1:

Branch:
release-0.11.1
(see https://git1-us-west.apache.org/repos/asf?p=mahout.git)

The release artifacts to be voted on can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1017/org/apache/mahout/apache-mahout-distribution/0.11.1/

The release artifacts are signed with the key with fingerprint D3541808:
http://www.apache.org/dist/mahout/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1017

-

The vote is open for the next 72 hours and passes if a majority of at least
three +1 PMC votes are cast.

The vote ends on Friday November 8, 2015.

[ ] +1 Release this package as Apache Mahout 0.11.1
[ ] -1 Do not release this package because ...

===


Re: Haters get Love too

2015-11-03 Thread Suneel Marthi
Thanks Pat, very interesting indeed.

On Tue, Nov 3, 2015 at 6:20 PM, Pat Ferrel  wrote:

> A colleague of mine just build a MAP@k precision evaluator for the Mahout
> based cooccurrence recommender we’ve been working on and we ran some data
> scraped from rottentomatoes.com  They have
> “fresh” and “rotten” reviews tied to reviewer ids.
>
> A fair bit of discussion has gone on about how to use negative
> preferences. We have been saying that negative preferences might be
> predictive of positive preferences and the cross-cooccurrence code in the
> new SimilarityAnalysis.cooccurrence method can make the data usable.
>
> We took the RT data for two “actions”: “fresh" as the primary, the best
> indicator of preference, and “rotten” as the secondary indicator. We found
> that MAP using only “fresh” was bettered by almost 20% when we included
> “rotten” as the secondary cross-cooccorrence action. For the strict out
> there we did not directly isolate the two actions, which is work remaining
> so some of the lift might be due to just having more data but it’s a really
> good first step because more data doesn't always translate to better
> performance and anyway it’s data you wouldn’t have otherwise.
>
> This opens up a new way to compare all sorts of other user signals, some
> long considered to be unusable by recommenders. Gender, location, category
> preferences are now fair game for testing.
>
> BTW we used this recommender, which is based on Mahout Samsara’s matrix
> math, cooccurrence and LLR.
> https://github.com/pferrel/scala-parallel-universal-recommendation <
> https://github.com/pferrel/scala-parallel-universal-recommendation>


Re: Is Mahout obsolete now?

2015-10-19 Thread Suneel Marthi
Thanks Sean.
Samsara is the new distributed linear algebra DSL that is engine agnostic
and presently support Spark and H2O (Flink is in the works).

We do have Recommenders built on top of Samsara today.

On Mon, Oct 19, 2015 at 3:42 PM, Sean Owen  wrote:

> No, this is pretty wrong. Spark is not, in general, a real-time
> anything. Spark Streaming is a near-real-time streaming framework, but
> it is not something you can build models with. Spark MLlib / ML are
> offline / batch. Not sure what you mean by Hadoop engine, but Spark
> does not build on MapReduce, if that's what you mean.
>
> The "classic" Mahout code (<= 0.9) is definitely deprecated. The "new"
> Mahout is not. It has a fairly different new recommender system called
> Samsara. It has Scala APIs. In fact, it uses Spark. I think you're
> somehow talking about the "classic" Mahout code here only.
>
> On Mon, Oct 19, 2015 at 2:38 PM, Fei Shan 
> wrote:
> > Spark is a in memory , near realtime Machine Learning frameowork , has
> > scala and java interface
> > Mahout is an offline Machine Learning framework, no scala apis
> >
> > they both built on the HDFS and Hadoop engine
> >
> > Spark has an ecosystem like Hadoop
> > Mahout is part of of Hadoop ecosystem
> >
> > Spark could beat Mahout on processing speed
> > and concise programming APIs
> >
> > for online data anaysis , Spark is a better choice.
> > for offline data analysis, both fits well.
> >
> >
> >
> > On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> > bpp...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> If I have used Mahout for my recommendation application, should I
> migrate
> >> into Spark MLib technology? Is the mahout still supported and migrated?
> >>
> >> Thanks
> >>
> >> *Prasad Priyadarshana Fernando <
> http://www.linkedin.com/in/prasadfernando
> >> >*
> >> Mobile: +1 330 283 5827
> >>
>


Re: Is Mahout obsolete now?

2015-10-19 Thread Suneel Marthi
This is so inaccurate and not true. You obviously have not been following
Mahout project. Mahout has long moved away from MapReduce and presently
support Spark, H2O and in future Flink as execution engines.

I would suggest you look at the recent Mahout 0.11.0 and see where the
project is before we delve into a comparison of Mahout vs Spark.


On Mon, Oct 19, 2015 at 3:38 PM, Fei Shan  wrote:

> Spark is a in memory , near realtime Machine Learning frameowork , has
> scala and java interface
> Mahout is an offline Machine Learning framework, no scala apis
>
> they both built on the HDFS and Hadoop engine
>
> Spark has an ecosystem like Hadoop
> Mahout is part of of Hadoop ecosystem
>
> Spark could beat Mahout on processing speed
> and concise programming APIs
>
> for online data anaysis , Spark is a better choice.
> for offline data analysis, both fits well.
>
>
>
> On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> bpp...@gmail.com> wrote:
>
> > Hi,
> >
> > If I have used Mahout for my recommendation application, should I migrate
> > into Spark MLib technology? Is the mahout still supported and migrated?
> >
> > Thanks
> >
> > *Prasad Priyadarshana Fernando <
> http://www.linkedin.com/in/prasadfernando
> > >*
> > Mobile: +1 330 283 5827
> >
>


Re: Is Mahout obsolete now?

2015-10-19 Thread Suneel Marthi
Hi Prasad,

As Sean has explained in an earlier posting on this thread, Mahout 0.9 and
earlier which were MapReduce based are not supported anymore.

We do have recommenders in Mahout 0.11.0 that have been built on the new
Samsara Math DSL.

Definitely would suggest that you check out the latest Mahout 0.11.0
release.

On Mon, Oct 19, 2015 at 3:14 PM, Prasad Priyadarshana Fernando <
bpp...@gmail.com> wrote:

> Hi,
>
> If I have used Mahout for my recommendation application, should I migrate
> into Spark MLib technology? Is the mahout still supported and migrated?
>
> Thanks
>
> *Prasad Priyadarshana Fernando  >*
> Mobile: +1 330 283 5827
>


Re: examples/bin/cluster-reuters.sh fails on k-means: Option (1)

2015-09-16 Thread Suneel Marthi
try running /bin/cluster-reuters.sh --help

to see the list of expected input options.


On Wed, Sep 16, 2015 at 8:54 PM, Disa Mhembere 
wrote:

> Hello all,
>
> I'm running mahout 0.11.1 on an ubuntu 14.04 server. I compiled the library
> with maven 3.3.3 under java 1.7 and tried to run the kmeans example
> in examples/bin/cluster-reuters.sh (i.e., choice 1). The script fails with:
>
> 15/09/15 23:27:35 ERROR AbstractJob: Missing value(s) --input
> Missing value(s) --input
> Usage:
>  [--input  --output  --outputFormat 
> --substring
>  --numWords  --pointsDir  --samplePoints
>  --dictionary  --dictionaryType 
> --evaluate --distanceMeasure  --help --tempDir 
> --startPhase  --endPhase ]
> --input (-i) inputPath to job input directory.
> 15/09/15 23:27:35 INFO MahoutDriver: Program took 119 ms (Minutes:
> 0.0019835)
> cat: /tmp/mahout-work-disa/reuters-kmeans/clusterdump: No such file or
> directory
>
> Any ideas how to resolve this? It seems like the script should provide he
> correct input (-i) but it does not and I'm not certain what is the correct
> input myself.
>
> FYI: Other options seem to work like 2, 3.
>
> Sincere thanks,
>
> --
> Disa Mhembere *:(){ :|:& };:*
> Johns Hopkins University. d...@jhu.edu
>


[RESULT] [VOTE] Apache Mahout 0.10.2 Release

2015-08-06 Thread Suneel Marthi
We had 3 +1 PMC votes and no -1s, the release has passed and the voting is
now closed.


[ANNOUNCE] Apache Mahout 0.10.2 Release

2015-08-06 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.10.2.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs most fully on
Spark.

To get started with Apache Mahout 0.10.2, download the release artifacts
and signatures from http://www.apache.org/dist/mahout/0.10.2/.

Many thanks to the contributors and committers who were part of this
release. Please see below for the Release Highlights.


RELEASE HIGHLIGHTS

This is an incremental minor release over Mahout 0.10.1 meant to introduce
several new features (all of which are also available in the 0.11 lineage)
and fix a few bugs.


Mahout 0.10.2


   1.

   In-core transpose view rewrites. Modifiable transpose views eg. (for
   (col - a.t) col := 5).
   2.

   Performance and parallelization improvements for AB', A'B, A'A spark
   physical operators.
   3.

   Optional structural flavor abstraction for in-core matrices. In-core
   matrices can now be tagged as e.g. sparse or dense.
   4.

   %*% optimization based on matrix flavors.
   5.

   In-core ::= sparse assignment functions.
   6.

   Assign := optimization (do proper traversal based on matrix flavors,
   similarly to %*%).
   7.

   Adding in-place elementwise functional assignment (e.g. mxA := exp _,
   mxA ::= exp _).
   8.

   Distributed and in-core version of simple elementwise analogues of
   scala.math._. for example, for log(x) the convention is dlog(drm),
   mlog(mx), vlog(vec). Unfortunately we cannot overload these functions over
   what is done in scala.math, i.e. scala would not allow log(mx) or log(drm)
   and log(Double) at the same time, mainly because they are being defined in
   different packages.
   9.

   Distributed and in-core first and second moment routines. R analogs:
   mean(), colMeans(), rowMeans(), variance(), sd(). By convention,
   distributed versions are prepended by (d) letter: colMeanVars()
   colMeanStdevs() dcolMeanVars() dcolMeanStdevs().
   10.

   Distance and squared distance matrix routines. R analog: dist(). Provide
   both squared and non-squared Euclidean distance matrices. By convention,
   distributed versions are prepended by (d) letter: dist(x), sqDist(x),
   dsqDist(x). Also a variation for pair-wise distance matrix of two different
   inputs x and y: sqDist(x,y), dsqDist(x,y).
   11.

   DRM row sampling api.
   12.

   Distributed performance bug fixes. This relates mostly to (a) matrix
   multiplication deficiencies, and (b) handling parallelism.
   13.

   Distributed engine neutral allreduceBlock() operator api for Spark and
   H2O.
   14.

   Distributed optimizer operators for elementwise functions. Rewrites
   recognizing e.g. 1+ drmX * dexp(drmX) as a single fused elementwise
   physical operator: elementwiseFunc(f1(f2(drmX)) where f1 = 1 + x and f2 =
   exp(x).
   15.

   More cbind, rbind flavors (e.g. 1 cbind mxX, 1 cbind drmX or the other
   way around) for Spark and H2O.
   16.

   Added +=: and *=: operators on vectors.
   17.

   Closeable API for broadcast tensors.
   18.

   Support for conversion of any type-keyed DRM into ordinally-keyed DRM.
   19.

   Scala logging style.
   20.

   rowSumsMap() summary for non-int-keyed DRMs.
   21.

   elementwise power operator ^ .
   22.

   R-like vector concatenation operator.
   23.

   In-core functional assignments e.g.: mxA := { (x) = x * x}.
   24.

   Straighten out behavior of Matrix.iterator() and iterateNonEmpty().
   25.

   New mutable transposition view for in-core matrices.  In-core matrix
   transpose view. rewrite with mostly two goals in mind: (1) enable
   mutability, e.g. for (col - mxA.t) col := k (2) translate matrix
   structural flavor for optimizers correctly. i.e. new SparseRowMatrix.t
   carries on as column-major structure.
   26.

   Native support for kryo serialization of tensor types.
   27.

   Deprecation of the MultiLayerPerceptron, ConcatenateVectorsJob and all
   related classes.
   28.

   Deprecation of SparseColumnMatrix.





STATS

A total of 31 separate JIRA issues are addressed in this release [2] with 2
bugfixes.

Mahout 0.11.0-snapshot (targeted for August 5, 2015)

   1.

   Support for Spark 1.3 sequence file write.
   2.

   Mahout Spark shell support for Spark 

Re: [VOTE] Apache Mahout 0.10.2 Release Candidate

2015-08-05 Thread Suneel Marthi
Cancelling the 0.10.2 Release, discovered a missing artifact which prevents
the release from going thru.


On Tue, Aug 4, 2015 at 9:31 PM, Suneel Marthi smar...@apache.org wrote:

 Thanks for the votes. The Voting for 0.10.2 is officially closed, we had 4
 +1 votes and no objections, will send an announce once the release is
 finalized.

 On Tue, Aug 4, 2015 at 8:36 PM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:

 This was user error, revoking my binding -1 vote.

 On Sun, Aug 2, 2015 at 5:22 PM, Andrew Musselman 
 andrew.mussel...@gmail.com
  wrote:

  -1 unless this is operator error on my part.
 
  $ gpg --verify Downloads/apache-mahout-distribution-0.10.2-src.zip.asc
  gpg: no signed data
  gpg: can't hash datafile: file open error
 
  On Sun, Aug 2, 2015 at 11:58 AM, Suneel Marthi smar...@apache.org
 wrote:
 
  If u folks have not read the email from last friday that talks about
 both
  0.10.2 and 0.11.0 releases this week, I would suggest that you please
 do.
  The plan is to release both 0.10.2 and 0.11.0 this week.  Seems like we
  have some bandwidth in the PMC (atleast per the last 2 emails on this
  thread) to push thru another release today (I definitely don't have the
  time) .  If someone else wants to push thru 0.11.0,  please do so.
 
 
 
 
  On Sun, Aug 2, 2015 at 1:27 PM, Andrew Musselman 
  andrew.mussel...@gmail.com
   wrote:
 
   Is there any reason not to release 11 too?
  
   On Sunday, August 2, 2015, Pat Ferrel p...@occamsmachete.com wrote:
  
+1 (binding) — do we have to say binding?
   
Why do we continue on Spark 1.2 when all distros have updated to
 Spark
1.3.1 long ago, and Spark has released 1.4 with 1.5 in the works.
  This is
rather incomprehensible to me since we have the master 0.11.0,
  running on
1.4 ready to release.
   
Can we please, please also release 0.11.0?
   
On Aug 1, 2015, at 9:35 PM, Andrew Palumbo ap@outlook.com
javascript:; wrote:
   
Verified source tar and zip, all tests pass.
   
Ran through all options of the classification and clustering
 examples
  in
the binary tar.gz distribution in pseudo-cluster mode for MR and
 Spark
without incident.
   
Ran through one option each in the .zip  Classification and
 Clustering
examples in both pseudo-cluster and MAHOUT_LOCAL mode without
  incident.
   
Verified spark-document-classifier.mscala example from the
  spark-shell in
both .zip and .tar.gz binaries.
   
+1 (binding)
   
On 08/01/2015 12:44 AM, Suneel Marthi wrote:
 Verified {src} * {bin, tar} and all tests pass.

 +1 (binding)



 On Fri, Jul 31, 2015 at 11:56 PM, Suneel Marthi 
 smar...@apache.org
javascript:; wrote:

 This is a call for Votes for Mahout 0.10.2 Release candidate
  available
at

  
 https://repository.apache.org/content/repositories/orgapachemahout-1011

 Need atleast 3 PMC +1 votes for the RC to pass. Voting runs
 until
   Sunday
 Aug 2, 2015.


 Please verify the following:

 1. Sigs and Hashes of Release artifacts (Ted/Drew/Grant/Stevo)
 2. AWS testing of {src, bin} * {tar, zip}  (Andrew ?)
 3. Integration testing of {src,bin} * {tar,zip} (Suneel/AP/)
 4. Run thru Examples and scripts




   
   
   
  
 
 
 





Re: [VOTE] Apache Mahout 0.11.0 Release Candidate

2015-08-05 Thread Suneel Marthi
Tested {src} * {zip,tar} and all tests pass.

+1 (binding)

On Thu, Aug 6, 2015 at 1:13 AM, Andrew Musselman andrew.mussel...@gmail.com
 wrote:

 +1, already tested

 On Wed, Aug 5, 2015 at 9:44 PM, Suneel Marthi smar...@apache.org wrote:

  This is the vote for release 0.11.0 of Apache Mahout.
 
  The vote will be going for at least 72 hours and will be closed on
  Thursday,
  August 6th, 2015.  Please download, test and vote with
 
   [ ] +1, accept RC as the official 0.11.0 release of Apache Mahout
  [ ] +0, I don't care either way,
  [ ] -1, do not accept RC as the official 0.11.0 release of Apache Mahout,
  because...
 
  Maven staging repo:
 
  https://repository.apache.org/content/repositories/orgapachemahout-1016
  https://repository.apache.org/content/repositories/orgapachemahout-1015
 
 
  These artifacts are the same as the previous 0.11.0 artifacts and there's
  been no code changes. If you have already tested the previous artifacts,
  please cast ur votes again and we'll finalize the release once we have at
  least 3 PMC +1 votes.
 



[VOTE] Apache Mahout 0.11.0 Release Candidate

2015-08-05 Thread Suneel Marthi
This is the vote for release 0.11.0 of Apache Mahout.

The vote will be going for at least 72 hours and will be closed on Thursday,
August 6th, 2015.  Please download, test and vote with

 [ ] +1, accept RC as the official 0.11.0 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.11.0 release of Apache Mahout,
because...

Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1016
https://repository.apache.org/content/repositories/orgapachemahout-1015

These artifacts are the same as the previous 0.11.0 artifacts and there's
been no code changes. If you have already tested the previous artifacts,
please cast ur votes again and we'll finalize the release once we have at
least 3 PMC +1 votes.


[VOTE] Apache Mahout 0.10.2 Release

2015-08-05 Thread Suneel Marthi
This is the vote for release 0.10.2 of Apache Mahout.

The vote will be going for at least 72 hours and will be closed on Thursday,
August 6th, 2015.  Please download, test and vote with

 [ ] +1, accept RC as the official 0.10.2 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.10.2 release of Apache Mahout,
because...

Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1015

This vote differs from the previous one to package a missing artifact. If
you have already tested the previous artifacts, please cast ur votes again
and we'll finalize the release sooner once we have atleast 3 PMC +1 votes.


Re: [VOTE] Apache Mahout 0.10.2 Release

2015-08-05 Thread Suneel Marthi
Tested the examples from {src, bin} in pseudo-cluster mode and all tests
pass.

Here's my +1 (binding)

On Wed, Aug 5, 2015 at 8:02 PM, Suneel Marthi smar...@apache.org wrote:

 This is the vote for release 0.10.2 of Apache Mahout.

 The vote will be going for at least 72 hours and will be closed on
 Thursday,
 August 6th, 2015.  Please download, test and vote with

  [ ] +1, accept RC as the official 0.10.2 release of Apache Mahout
 [ ] +0, I don't care either way,
 [ ] -1, do not accept RC as the official 0.10.2 release of Apache Mahout,
 because...

 Maven staging repo:

 https://repository.apache.org/content/repositories/orgapachemahout-1015

 This vote differs from the previous one to package a missing artifact. If
 you have already tested the previous artifacts, please cast ur votes again
 and we'll finalize the release sooner once we have atleast 3 PMC +1 votes.



Re: [VOTE] Apache Mahout 0.10.2 Release Candidate

2015-08-04 Thread Suneel Marthi
Thanks for the votes. The Voting for 0.10.2 is officially closed, we had 4
+1 votes and no objections, will send an announce once the release is
finalized.

On Tue, Aug 4, 2015 at 8:36 PM, Andrew Musselman andrew.mussel...@gmail.com
 wrote:

 This was user error, revoking my binding -1 vote.

 On Sun, Aug 2, 2015 at 5:22 PM, Andrew Musselman 
 andrew.mussel...@gmail.com
  wrote:

  -1 unless this is operator error on my part.
 
  $ gpg --verify Downloads/apache-mahout-distribution-0.10.2-src.zip.asc
  gpg: no signed data
  gpg: can't hash datafile: file open error
 
  On Sun, Aug 2, 2015 at 11:58 AM, Suneel Marthi smar...@apache.org
 wrote:
 
  If u folks have not read the email from last friday that talks about
 both
  0.10.2 and 0.11.0 releases this week, I would suggest that you please
 do.
  The plan is to release both 0.10.2 and 0.11.0 this week.  Seems like we
  have some bandwidth in the PMC (atleast per the last 2 emails on this
  thread) to push thru another release today (I definitely don't have the
  time) .  If someone else wants to push thru 0.11.0,  please do so.
 
 
 
 
  On Sun, Aug 2, 2015 at 1:27 PM, Andrew Musselman 
  andrew.mussel...@gmail.com
   wrote:
 
   Is there any reason not to release 11 too?
  
   On Sunday, August 2, 2015, Pat Ferrel p...@occamsmachete.com wrote:
  
+1 (binding) — do we have to say binding?
   
Why do we continue on Spark 1.2 when all distros have updated to
 Spark
1.3.1 long ago, and Spark has released 1.4 with 1.5 in the works.
  This is
rather incomprehensible to me since we have the master 0.11.0,
  running on
1.4 ready to release.
   
Can we please, please also release 0.11.0?
   
On Aug 1, 2015, at 9:35 PM, Andrew Palumbo ap@outlook.com
javascript:; wrote:
   
Verified source tar and zip, all tests pass.
   
Ran through all options of the classification and clustering
 examples
  in
the binary tar.gz distribution in pseudo-cluster mode for MR and
 Spark
without incident.
   
Ran through one option each in the .zip  Classification and
 Clustering
examples in both pseudo-cluster and MAHOUT_LOCAL mode without
  incident.
   
Verified spark-document-classifier.mscala example from the
  spark-shell in
both .zip and .tar.gz binaries.
   
+1 (binding)
   
On 08/01/2015 12:44 AM, Suneel Marthi wrote:
 Verified {src} * {bin, tar} and all tests pass.

 +1 (binding)



 On Fri, Jul 31, 2015 at 11:56 PM, Suneel Marthi 
 smar...@apache.org
javascript:; wrote:

 This is a call for Votes for Mahout 0.10.2 Release candidate
  available
at

  
 https://repository.apache.org/content/repositories/orgapachemahout-1011

 Need atleast 3 PMC +1 votes for the RC to pass. Voting runs until
   Sunday
 Aug 2, 2015.


 Please verify the following:

 1. Sigs and Hashes of Release artifacts (Ted/Drew/Grant/Stevo)
 2. AWS testing of {src, bin} * {tar, zip}  (Andrew ?)
 3. Integration testing of {src,bin} * {tar,zip} (Suneel/AP/)
 4. Run thru Examples and scripts




   
   
   
  
 
 
 



[VOTE] Apache Mahout 0.11.0 Release Candidate

2015-08-03 Thread Suneel Marthi
This is the vote for release 0.11.0 of Apache Mahout.

The vote will be going for at least 72 hours and will be closed on
Wednesday,
August 6th, 2015.  Please download, test and vote with

[ ] +1, accept RC as the official 0.11.0 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.11.0 release of Apache Mahout,
because...


Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1013/

The git tag to be voted upon is release-0.11.0


Re: [VOTE] Apache Mahout 0.10.2 Release Candidate

2015-08-02 Thread Suneel Marthi
If u folks have not read the email from last friday that talks about both
0.10.2 and 0.11.0 releases this week, I would suggest that you please do.
The plan is to release both 0.10.2 and 0.11.0 this week.  Seems like we
have some bandwidth in the PMC (atleast per the last 2 emails on this
thread) to push thru another release today (I definitely don't have the
time) .  If someone else wants to push thru 0.11.0,  please do so.




On Sun, Aug 2, 2015 at 1:27 PM, Andrew Musselman andrew.mussel...@gmail.com
 wrote:

 Is there any reason not to release 11 too?

 On Sunday, August 2, 2015, Pat Ferrel p...@occamsmachete.com wrote:

  +1 (binding) — do we have to say binding?
 
  Why do we continue on Spark 1.2 when all distros have updated to Spark
  1.3.1 long ago, and Spark has released 1.4 with 1.5 in the works. This is
  rather incomprehensible to me since we have the master 0.11.0, running on
  1.4 ready to release.
 
  Can we please, please also release 0.11.0?
 
  On Aug 1, 2015, at 9:35 PM, Andrew Palumbo ap@outlook.com
  javascript:; wrote:
 
  Verified source tar and zip, all tests pass.
 
  Ran through all options of the classification and clustering examples in
  the binary tar.gz distribution in pseudo-cluster mode for MR and Spark
  without incident.
 
  Ran through one option each in the .zip  Classification and Clustering
  examples in both pseudo-cluster and MAHOUT_LOCAL mode without incident.
 
  Verified spark-document-classifier.mscala example from the spark-shell in
  both .zip and .tar.gz binaries.
 
  +1 (binding)
 
  On 08/01/2015 12:44 AM, Suneel Marthi wrote:
   Verified {src} * {bin, tar} and all tests pass.
  
   +1 (binding)
  
  
  
   On Fri, Jul 31, 2015 at 11:56 PM, Suneel Marthi smar...@apache.org
  javascript:; wrote:
  
   This is a call for Votes for Mahout 0.10.2 Release candidate available
  at
  
 https://repository.apache.org/content/repositories/orgapachemahout-1011
  
   Need atleast 3 PMC +1 votes for the RC to pass. Voting runs until
 Sunday
   Aug 2, 2015.
  
  
   Please verify the following:
  
   1. Sigs and Hashes of Release artifacts (Ted/Drew/Grant/Stevo)
   2. AWS testing of {src, bin} * {tar, zip}  (Andrew ?)
   3. Integration testing of {src,bin} * {tar,zip} (Suneel/AP/)
   4. Run thru Examples and scripts
  
  
  
  
 
 
 



[VOTE] Apache Mahout 0.11.0 Release candidate

2015-08-02 Thread Suneel Marthi
This is the vote for release 0.11.0 of Apache Mahout.

The vote will be going for at least 72 hours and will be closed on
Wednesday,
August 5th, 2015.  Please download, test and vote with

[ ] +1, accept RC as the official 0.11.0 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.11.0 release of Apache Mahout,
because...


Maven staging repo:

https://repository.apache.org/content/repositories/orgapachemahout-1012
https://repository.apache.org/content/repositories/orgapachemahout-1012/org/apache/mahout/mahout-distribution/0.11.0/
https://repository.apache.org/content/repositories/orgapachebigtop-1001

The git tag to be voted upon is release-0.11.0


Re: [VOTE] Apache Mahout 0.10.2 Release Candidate

2015-07-31 Thread Suneel Marthi
Verified {src} * {bin, tar} and all tests pass.

+1 (binding)



On Fri, Jul 31, 2015 at 11:56 PM, Suneel Marthi smar...@apache.org wrote:

 This is a call for Votes for Mahout 0.10.2 Release candidate available at
 https://repository.apache.org/content/repositories/orgapachemahout-1011

 Need atleast 3 PMC +1 votes for the RC to pass. Voting runs until Sunday
 Aug 2, 2015.


 Please verify the following:

 1. Sigs and Hashes of Release artifacts (Ted/Drew/Grant/Stevo)
 2. AWS testing of {src, bin} * {tar, zip}  (Andrew ?)
 3. Integration testing of {src,bin} * {tar,zip} (Suneel/AP/)
 4. Run thru Examples and scripts






[VOTE] Apache Mahout 0.10.2 Release Candidate

2015-07-31 Thread Suneel Marthi
This is a call for Votes for Mahout 0.10.2 Release candidate available at
https://repository.apache.org/content/repositories/orgapachemahout-1011

Need atleast 3 PMC +1 votes for the RC to pass. Voting runs until Sunday
Aug 2, 2015.


Please verify the following:

1. Sigs and Hashes of Release artifacts (Ted/Drew/Grant/Stevo)
2. AWS testing of {src, bin} * {tar, zip}  (Andrew ?)
3. Integration testing of {src,bin} * {tar,zip} (Suneel/AP/)
4. Run thru Examples and scripts


Re: deprecation of lucene2seq

2015-07-03 Thread Suneel Marthi
Please Also note that MultiLAyerPerceptron and ConcatenateVectorsJob that
were marked as deprecated in 0.10.o would be purged in the upcoming 0.10.2
release planned for July 10.

On Thu, Jul 2, 2015 at 4:13 PM, Andrew Palumbo ap@outlook.com wrote:

 Please note that mahout lucene2seq and all related classes will be
 deprecated as of the upcoming Mahout 0.10.2 release.

 Thank You,

 Andy





Re: FP-Growth deprecated

2015-06-24 Thread Suneel Marthi
Fp growth has been deprecated-removed-deprecated since 0.8.  It will be removed 
completely in the subsequent release as it's not been maintained for the past 5 
releases. 

Yes u would have to use 0.8 if u r still looking to use it but it's not 
supported anymore as the project has moved away from MapReduce.

Sent from my iPhone

 On Jun 24, 2015, at 9:39 AM, Pat Ferrel p...@occamsmachete.com wrote:
 
 What do you mean by MR
 
 On Jun 22, 2015, at 11:22 PM, guo.weizhan guo.weiz...@gmail.com wrote:
 
 We are using MR
 
 在 Pat Ferrel p...@occamsmachete.com,2015年6月22日 上午8:37写道:
 
 What is your application? 
 
 On Jun 17, 2015, at 7:06 AM, guo weizhan guo.weiz...@gmail.com 
 mailto:guo.weiz...@gmail.com wrote: 
 
 Hi All, 
 
 I found the FP-Growth was deprecated since 0.8, but we want this algorithm 
 to do the association analysis. Do I have to use the old version or  Is 
 there any other association analysis I can use in the lastest version? 
 
 
 Thanks, 
 Guo 
 
 


Re: Streaming K-means

2015-06-17 Thread Suneel Marthi
Dmitriy is correct in that the Streaming KMeans in MlLib is a wrong name
for something that was meant to convey Spark Streaming + KMeans.

The Mahout Streaming KMeans is an implementation of the Meyerson paper
that's been referred to in Dmitriy's email.

I have had folks wrongly misconstrue Streaming KMeans as being Spark
Streaming + KMeans, thanks to the bad naming on part of the MlLib folks.

I had spoken to Jeremy Freeman, the MLLib Streaming KMEans contributor
about this and he agrees that the intent was to convery Spark Streaming +
KMeans, he definitely wasn't aware of the Streaming KMeans algorithm that
existed much before Spark Streaming.



On Tue, Jun 16, 2015 at 5:34 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:

  streaming k-means is something else afaik. Streaming k-means is reserved
 for a particular k-means method (in Mahout, at least, [1]).

 Whereas as far as i understand what mllib calls streaming k-means is name
 given by mllib contributor which really means online k-means, i.e. radar
 tracking of centroids over time over stream of (x_i, t_i) pairs and that
 uses Spark streaming, but has nothing to do with Shindler et. al. method.

 At least that was our understanding last time we looked at the issue of the
 names here.

 So this issue has come several times already when people come and say,
 what? streaming k-means? mllib has streaming k-means whereas everybody
 else is talking about something else.

 [1]

 http://papers.nips.cc/paper/4362-fast-and-accurate-k-means-for-large-datasets.pdf

 On Tue, Jun 16, 2015 at 2:04 PM, RJ Nowling rnowl...@gmail.com wrote:

  There is a streaming k-means implementation in MLlib that uses reservoir
  sampling.
 
  On Tue, Jun 2, 2015 at 2:24 AM, Marko Dinic marko.di...@nissatech.com
  wrote:
 
   Ted,
  
   Thank you for your answer.
  
   What would you then recommend me to do? My idea is to implement it to
   enable clustering of time series using DTW (Dynamic Time Warping) as
   distance measure. As you know, the main problem is that K-medoids is
 not
   scalable, so that's standing on my way. Of course, it could be used
 with
   other distances as well.
  
   I have already implemented something that I consider a scalable
  K-medoids,
   based on using pivots to speed up medoid selection (
   https://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/99/82). This
   works for distance measures such as Euclidean, has some limitations
 (best
   results are in case of normal distribution, outliers could be a
 problem),
   but it works pretty good (considering the computations). The thing is,
 it
   can't be used with DTW, since it relies on projections, while triangle
   inequality for DTW does not hold. That is why I'm considering this
   Streaming approach now.
  
   Would you think that it is worthy of giving a shot? I'm really
 stretching
   for a scalable solution.
  
   Best regards,
   Marko
  
  
   On Tue 02 Jun 2015 12:03:40 AM CEST, Ted Dunning wrote:
  
   The streaming k-means works by building a sketch of the data which is
  then
   used to do real clustering.
  
   It might be that this sketch would be acceptable to do k-medoids, but
  that
   is definitely not guaranteed.
  
   Similarly, it might be possible to build a medoid sketch instead of a
  mean
   based sketch, but this is also unexplored ground.
  
   The virtue of the first approach (using a m-means sketch as input to
   k-medoids) would be that it would make the k-medoids scalable.
  
  
  
   On Mon, Jun 1, 2015 at 1:04 PM, Marko Dinic 
 marko.di...@nissatech.com
   wrote:
  
Hello everyone,
  
   I have an idea and I would like to get a validation from community
  about
   it.
  
   In Mahout there is an implementation of Streaming K-means. I'm
  interested
   in your opinion would it make sense to make a similar implementation
 of
   Streaming K-medoids?
  
   K-medoids has even bigger problems than K-means because it's not
   scalable,
   but can be useful in some cases (e.g. It allows more sophisticated
   distance
   measures).
  
   What is your opinion about implementation of this?
  
   Best regards,
   Marko
  
  
  
 



Re: Cannot Compile Mahout Math Module

2015-06-17 Thread Suneel Marthi
U should be using Java 7

Sent from my iPhone

 On Jun 17, 2015, at 10:47 AM, Prasad Priyadarshana Fernando 
 bpp...@gmail.com wrote:
 
 Hi,
 
 Mahout Math modules has compilation issues. Does anyone know the root
 course?
 
 Thanks
 
 
 Information:Using javac 1.8.0_45 to compile java sources
 Information:java: Errors occurred while compiling module 'mahout-math'
 Information:6/17/15 10:43 AM - Compilation completed with 100 errors and 0
 warnings in 6s 7ms
 /home/prasad/SrcCodes/apache-mahout-distribution-0.10.1/math/src/main/java/org/apache/mahout/math/Sorting.java
 Error:(24, 39) java: cannot find symbol
  symbol:   class ByteComparator
  location: package org.apache.mahout.math.function
 Error:(25, 39) java: cannot find symbol
  symbol:   class CharComparator
  location: package org.apache.mahout.math.function
 Error:(26, 39) java: cannot find symbol
  symbol:   class DoubleComparator
  location: package org.apache.mahout.math.function
 Error:(27, 39) java: cannot find symbol
  symbol:   class FloatComparator
  location: package org.apache.mahout.math.function
 Error:(28, 39) java: cannot find symbol
  symbol:   class IntComparator
  location: package org.apache.mahout.math.function
 Error:(29, 39) java: cannot find symbol
  symbol:   class LongComparator
  location: package org.apache.mahout.math.function
 Error:(30, 39) java: cannot find symbol
  symbol:   class ShortComparator
  location: package org.apache.mahout.math.function
 Error:(52, 62) java: cannot find symbol
  symbol:   class ByteComparator
  location: class org.apache.mahout.math.Sorting
 Error:(64, 62) java: cannot find symbol
  symbol:   class CharComparator
  location: class org.apache.mahout.math.Sorting
 Error:(77, 7) java: cannot find symbol
  symbol:   class DoubleComparator
  location: class org.apache.mahout.math.Sorting
 Error:(90, 7) java: cannot find symbol
  symbol:   class FloatComparator
  location: class org.apache.mahout.math.Sorting
 Error:(102, 61) java: cannot find symbol
  symbol:   class IntComparator
  location: class org.apache.mahout.math.Sorting
 Error:(123, 48) java: cannot find symbol
  symbol:   class IntComparator
  location: class org.apache.mahout.math.Sorting
 Error:(132, 62) java: cannot find symbol
  symbol:   class LongComparator
  location: class org.apache.mahout.math.Sorting
 Error:(145, 7) java: cannot find symbol
  symbol:   class ShortComparator
  location: class org.apache.mahout.math.Sorting
 Error:(174, 7) java: cannot find symbol
  symbol:   class ByteComparator
  location: class org.apache.mahout.math.Sorting
 Error:(196, 68) java: cannot find symbol
  symbol:   class ByteComparator
  location: class org.apache.mahout.math.Sorting
 Error:(292, 52) java: cannot find symbol
  symbol:   class IntComparator
  location: class org.apache.mahout.math.Sorting
 Error:(297, 54) java: cannot find symbol
  symbol:   class IntComparator
  location: class org.apache.mahout.math.Sorting
 Error:(419, 57) java: cannot find symbol
  symbol:   class IntComparator
  location: class org.apache.mahout.math.Sorting
 Error:(440, 66) java: cannot find symbol
  symbol:   class CharComparator
  location: class org.apache.mahout.math.Sorting
 Error:(446, 68) java: cannot find symbol
  symbol:   class CharComparator
  location: class org.apache.mahout.math.Sorting
 Error:(543, 68) java: cannot find symbol
  symbol:   class DoubleComparator
  location: class org.apache.mahout.math.Sorting
 Error:(549, 70) java: cannot find symbol
  symbol:   class DoubleComparator
  location: class org.apache.mahout.math.Sorting
 Error:(645, 67) java: cannot find symbol
  symbol:   class FloatComparator
  location: class org.apache.mahout.math.Sorting
 Error:(651, 69) java: cannot find symbol
  symbol:   class FloatComparator
  location: class org.apache.mahout.math.Sorting
 Error:(747, 65) java: cannot find symbol
  symbol:   class IntComparator
  location: class org.apache.mahout.math.Sorting
 Error:(753, 67) java: cannot find symbol
  symbol:   class IntComparator
  location: class org.apache.mahout.math.Sorting
 Error:(849, 66) java: cannot find symbol
  symbol:   class LongComparator
  location: class org.apache.mahout.math.Sorting
 Error:(855, 68) java: cannot find symbol
  symbol:   class LongComparator
  location: class org.apache.mahout.math.Sorting
 Error:(1073, 67) java: cannot find symbol
  symbol:   class ShortComparator
  location: class org.apache.mahout.math.Sorting
 Error:(1079, 69) java: cannot find symbol
  symbol:   class ShortComparator
  location: class org.apache.mahout.math.Sorting
 Error:(1317, 24) java: cannot find symbol
  symbol:   class ByteComparator
  location: class org.apache.mahout.math.Sorting
 Error:(1341, 66) java: cannot find symbol
  symbol:   class ByteComparator
  location: class org.apache.mahout.math.Sorting
 Error:(1347, 76) java: cannot find symbol
  symbol:   class ByteComparator
  location: class org.apache.mahout.math.Sorting
 Error:(1409, 72) java: cannot find symbol
  symbol:   class 

Re: Updated AMI for EMR

2015-06-01 Thread Suneel Marthi
Highly likely that there will be another 0.10.x out by July, will they be
pulling off the latest ?

On Mon, Jun 1, 2015 at 2:18 PM, Andrew Musselman andrew.mussel...@gmail.com
 wrote:

 AWS will be releasing a new AMI in July that will include our 0.10.1
 release.



Re: [VOTE] Mahout 0.10.1 Release Candidate

2015-05-31 Thread Suneel Marthi
Thanks Stevo.

This release has passed with 5 +1 PMC binding votes and the Voting is
officially closed, will send the Release Announce later today.



On Sun, May 31, 2015 at 3:35 PM, Stevo Slavić ssla...@gmail.com wrote:

 +1 (binding)

 Verified hashes and signatures; distribution sources tarball and zip unpack
 well, build passes from unpacked sources.

 On Sun, May 31, 2015 at 8:34 PM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:

  +1 (binding)
 
  Verified tests pass for src tarball and zip; I'm comfortable skipping EMR
  smoke testing for a point release given team opinion that it's not
  required.
 
  On Sun, May 31, 2015 at 9:43 AM, Andrew Palumbo ap@outlook.com
  wrote:
 
   +1 (binding)
  
   Ran (on Hadoop 2.4.1 + spark 1.2.1) all examples with all options in
 the
   |.tar.gz| binary archive in pseudo-cluster mode and one with
   MAHOUT_LOCAL=true with only the previously noted minor data issue,
 which
  I
   agree can wait for the next release.
  
   Ran a mix and match of the |.zip| binary archive examples with
   MAHOUT_LOCAL=true and in pseudo-cluster mode without issue.
  
   Tested the shell from both archives for qr and matrix display fixes.
  
  
  
   On 05/31/2015 12:09 PM, Pat Ferrel wrote:
  
   +1 (binding)
  
   Verified on Spark 1.3 psuedo-clustered HDFS 2.4
  
   There are some cleanup of example data issues that can wait for next
   release.
  
  
   On May 30, 2015, at 8:16 PM, Suneel Marthi smar...@apache.org
 wrote:
  
   Verified locally build and tests for {source} * {zip, tar}. No issues
   found.
  
   +1 (binding)
  
   On Sat, May 30, 2015 at 11:14 PM, Suneel Marthi smar...@apache.org
   wrote:
  
Andrew Palumbo / Dmitriy:  Please also verify the various scenarios
 as
   described in M-1693
  
   On Sat, May 30, 2015 at 10:32 PM, Suneel Marthi smar...@apache.org
   wrote:
  
Here's the new 0.10.1 Release Candidate
  
  
  
  
 
 https://repository.apache.org/content/repositories/orgapachemahout-1009/org/apache/mahout/apache-mahout-distribution/0.10.1/
  
   The Voting ends on Sunday, May 31 2015.
  
   Need a +1 from the PMC for each of the line items below for the
  release
   to pass.
  
   1. Ted/Grant:  Verify hashes and checksums - {binary,source} x
  {zip,tar}
   + pom
  
   2. AKM:  Verify examples on EMR  - {binary, source} * {zip, tar}
  
   3. Andrew Palumbo: Verify examples locally - {binary} * {zip, tar}
  
   4. Suneel: Verify build and tests - {source} * {zip, tar}
  
   5. Pat:  Verify examples locally - {source} * {zip, tar}
  
   The LICENSE and NOTICE files have not been updated this time and
 will
  be
   addressed in future releases.
  
  
  
   On Sat, May 30, 2015 at 8:32 PM, Suneel Marthi 
  suneel.mar...@gmail.com
   
   wrote:
  
Please hold ur votes, will be refreshing staging with another build
  in
   the next hour
  
   On Sat, May 30, 2015 at 8:31 PM, Andrew Musselman 
   andrew.mussel...@gmail.com wrote:
  
Likewise source zip and tarballs build and pass tests.
  
   On Sat, May 30, 2015 at 3:23 PM, Suneel Marthi 
 smar...@apache.org
   wrote:
  
Verified {source} * {zip, tar} and all tests pass.
  
   +1 (binding)
  
   On Sat, May 30, 2015 at 5:28 PM, Suneel Marthi 
 smar...@apache.org
  
  
   wrote:
  
   This is a call for VOTE to pass Mahout 0.10.1 release candidate
  
   that's
  
   available at
  
  
  
  
  
 
 https://repository.apache.org/content/repositories/orgapachemahout-1008/org/apache/mahout/mahout-distribution/0.10.1/
  
   Need atleast 3 PMC +1 (binding) votes to cut the release
  
   Below are the tasks breakdown for the PMC and committers:
  
   Andy Palumbo  Pat Ferrel: verify the binary artifacts and run
  tests
  
   Suneel  AKM:  verify the src artifacts
  
   Ted/Grant/Drew: verify the hashes and Sigs
  
   The LICENSE.txt and NOTICE.txt still need to be updated and will
  
   not be
  
   addressed as part of 0.10.1 release.
  
  
  
  
  
  
 



[ANNOUNCE] Apache Mahout 0.10.1 Released

2015-05-31 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.10.1.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs most fully on
Spark.

To get started with Apache Mahout 0.10.1, download the release artifacts
and signatures from http://www.apache.org/dist/mahout/0.10.1/.


Many thanks to the contributors and committers who were part of this
release. Please see below for the Release Highlights.


RELEASE HIGHLIGHTS

This is an incremental minor release over Mahout 0.10.0 meant to fix a few
bugs and upgrade to Spark 1.2.2 or less.

Mahout 0.10.1

   1.

   This release fixes a major memory usage bug in co-occurrence analysis
   used by the driver spark-itemsimilarity MAHOUT-1707. This will now
   require far less memory in the executor.
   2.

   Support Spark 1.2.2 or less - due to a bug in Spark 1.2+ in the
   JavaSerializer (SPARK-6069) we removed the use of Guava from any code
   executed in Spark Executors. To do this we created a Scala Collections
   based BiMap so any example code showing how to use the old Guava
   collections is obsolete.
   3.

   Some minor fixes to Mahout-Samsara QR Decomposition and matrix ops.
   4.

   Trim down packages size to  200MB - MAHOUT-1704.
   5.

   Minor testing indicates binary compatibility with Spark 1.3 except for
   the Mahout Shell, which does not run.


STATS

A total of 9 separate JIRA issues are addressed in this release [2] with 5
bugfixes.


Scope of Mahout 0.10.2 ~ targeted for June 28, 2015

   1.

   In-core transpose view rewrites. Modifiable transpose views (for (col -
   a.t) col := 5).
   2.

   Matrix structure flavor additions. (understand general matrix structure
   and stride direction).
   3.

   %*% optimization based on matrix flavors.
   4.

   In-core ::= sparse assignment functions.
   5.

   Assign := optimization (do proper traversal based on matrix flavors,
   similarly to %*%).
   6.

   Adding in-place elementwise functional assignment (e.g. mxA := exp _,
   mxA ::= exp _).
   7.

   Distributed and in-core version of simple elementwise analogues of
   scala.math._. for example, for log(x) the convention is dlog(drm),
   mlog(mx), vlog(vec). Unfortunately we cannot overload these functions over
   what is done in scala.math, i.e. scala would not allow log(mx) or log(drm)
   and log(Double) at the same time, mainly because they are being defined in
   different packages.
   8.

   Distributed performance bug fixes. This relates mostly to (a) matrix
   multiplication deficiencies, and (b) handling parallelism.
   9.

   Distributed allreduceBlock predicate.
   10.

   Distributed optimizer operators for elementwise functions. Rewrites
   recognizing e.g. 1+ drmX * dexp(drmX) as a single fused elementwise
   physical operator.
   11.

   More cbind, rbind flavors (e.g. 1 cbind mxX, 1 cbind drmX or the other
   way around).



Mahout 0.11.0-snapshot (ongoing, but available)

   1.

   Support for Spark 1.3 sequence file write.
   2.

   Spark Shell (timing TBD).
   3.

   First release that would see integration of Apache Mahout with Apache
   Flink as a backend.


GETTING STARTED

Download the release artifacts and signatures at
http://www.apache.org/dist/mahout/0.10.1/
The examples directory contains several working examples of the core
functionality available in Mahout. These can be run via scripts in the
examples/bin directory. Most examples do not need a Hadoop cluster in order
to run.

FUTURE PLANS

We will continue bug fixes and enhancements on the 0.10.x branch, which
will remain dependent on Spark 1.2.x. Support for Spark 1.3 will be in the
master branch reflecting Mahout-0.11.0-SNAPSHOT. To see progress on this
branch look here: https://github.com/apache/mahout/commits/master.  As of
this writing it is not ready yet to build for Spark 1.3.

Integration with Apache Flink is in the works in collaboration with TU
Berlin and Data Artisans to add Flink as the 3rd execution engine to
Mahout. This would be in addition to existing Apache Spark and H2O engines.

CONTRIBUTING


If you are interested in contributing, please see our How to Contribute
http://mahout.apache.org/developers/how-to-contribute.html[3] page or
contact us via email at 

[VOTE] Mahout 0.10.1 Release Candidate

2015-05-30 Thread Suneel Marthi
This is a call for VOTE to pass Mahout 0.10.1 release candidate that's
available at

https://repository.apache.org/content/repositories/orgapachemahout-1008/org/apache/mahout/mahout-distribution/0.10.1/

Need atleast 3 PMC +1 (binding) votes to cut the release

Below are the tasks breakdown for the PMC and committers:

Andy Palumbo  Pat Ferrel: verify the binary artifacts and run tests

Suneel  AKM:  verify the src artifacts

Ted/Grant/Drew: verify the hashes and Sigs

The LICENSE.txt and NOTICE.txt still need to be updated and will not be
addressed as part of 0.10.1 release.


Re: [VOTE] Mahout 0.10.1 Release Candidate

2015-05-30 Thread Suneel Marthi
Andrew Palumbo / Dmitriy:  Please also verify the various scenarios as
described in M-1693

On Sat, May 30, 2015 at 10:32 PM, Suneel Marthi smar...@apache.org wrote:

 Here's the new 0.10.1 Release Candidate


 https://repository.apache.org/content/repositories/orgapachemahout-1009/org/apache/mahout/apache-mahout-distribution/0.10.1/

 The Voting ends on Sunday, May 31 2015.

 Need a +1 from the PMC for each of the line items below for the release to
 pass.

 1. Ted/Grant:  Verify hashes and checksums - {binary,source} x {zip,tar} +
 pom

 2. AKM:  Verify examples on EMR  - {binary, source} * {zip, tar}

 3. Andrew Palumbo: Verify examples locally - {binary} * {zip, tar}

 4. Suneel: Verify build and tests - {source} * {zip, tar}

 5. Pat:  Verify examples locally - {source} * {zip, tar}

 The LICENSE and NOTICE files have not been updated this time and will be
 addressed in future releases.



 On Sat, May 30, 2015 at 8:32 PM, Suneel Marthi suneel.mar...@gmail.com
 wrote:

 Please hold ur votes, will be refreshing staging with another build in
 the next hour

 On Sat, May 30, 2015 at 8:31 PM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:

 Likewise source zip and tarballs build and pass tests.

 On Sat, May 30, 2015 at 3:23 PM, Suneel Marthi smar...@apache.org
 wrote:

  Verified {source} * {zip, tar} and all tests pass.
 
  +1 (binding)
 
  On Sat, May 30, 2015 at 5:28 PM, Suneel Marthi smar...@apache.org
 wrote:
 
   This is a call for VOTE to pass Mahout 0.10.1 release candidate
 that's
   available at
  
  
  
 
 https://repository.apache.org/content/repositories/orgapachemahout-1008/org/apache/mahout/mahout-distribution/0.10.1/
  
   Need atleast 3 PMC +1 (binding) votes to cut the release
  
   Below are the tasks breakdown for the PMC and committers:
  
   Andy Palumbo  Pat Ferrel: verify the binary artifacts and run tests
  
   Suneel  AKM:  verify the src artifacts
  
   Ted/Grant/Drew: verify the hashes and Sigs
  
   The LICENSE.txt and NOTICE.txt still need to be updated and will not
 be
   addressed as part of 0.10.1 release.
  
  
  
 






Re: [VOTE] Mahout 0.10.1 Release Candidate

2015-05-30 Thread Suneel Marthi
Verified locally build and tests for {source} * {zip, tar}. No issues found.

+1 (binding)

On Sat, May 30, 2015 at 11:14 PM, Suneel Marthi smar...@apache.org wrote:

 Andrew Palumbo / Dmitriy:  Please also verify the various scenarios as
 described in M-1693

 On Sat, May 30, 2015 at 10:32 PM, Suneel Marthi smar...@apache.org
 wrote:

 Here's the new 0.10.1 Release Candidate


 https://repository.apache.org/content/repositories/orgapachemahout-1009/org/apache/mahout/apache-mahout-distribution/0.10.1/

 The Voting ends on Sunday, May 31 2015.

 Need a +1 from the PMC for each of the line items below for the release
 to pass.

 1. Ted/Grant:  Verify hashes and checksums - {binary,source} x {zip,tar}
 + pom

 2. AKM:  Verify examples on EMR  - {binary, source} * {zip, tar}

 3. Andrew Palumbo: Verify examples locally - {binary} * {zip, tar}

 4. Suneel: Verify build and tests - {source} * {zip, tar}

 5. Pat:  Verify examples locally - {source} * {zip, tar}

 The LICENSE and NOTICE files have not been updated this time and will be
 addressed in future releases.



 On Sat, May 30, 2015 at 8:32 PM, Suneel Marthi suneel.mar...@gmail.com
 wrote:

 Please hold ur votes, will be refreshing staging with another build in
 the next hour

 On Sat, May 30, 2015 at 8:31 PM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:

 Likewise source zip and tarballs build and pass tests.

 On Sat, May 30, 2015 at 3:23 PM, Suneel Marthi smar...@apache.org
 wrote:

  Verified {source} * {zip, tar} and all tests pass.
 
  +1 (binding)
 
  On Sat, May 30, 2015 at 5:28 PM, Suneel Marthi smar...@apache.org
 wrote:
 
   This is a call for VOTE to pass Mahout 0.10.1 release candidate
 that's
   available at
  
  
  
 
 https://repository.apache.org/content/repositories/orgapachemahout-1008/org/apache/mahout/mahout-distribution/0.10.1/
  
   Need atleast 3 PMC +1 (binding) votes to cut the release
  
   Below are the tasks breakdown for the PMC and committers:
  
   Andy Palumbo  Pat Ferrel: verify the binary artifacts and run tests
  
   Suneel  AKM:  verify the src artifacts
  
   Ted/Grant/Drew: verify the hashes and Sigs
  
   The LICENSE.txt and NOTICE.txt still need to be updated and will
 not be
   addressed as part of 0.10.1 release.
  
  
  
 







Re: [VOTE] Mahout 0.10.1 Release Candidate

2015-05-30 Thread Suneel Marthi
Please hold ur votes, will be refreshing staging with another build in the
next hour

On Sat, May 30, 2015 at 8:31 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

 Likewise source zip and tarballs build and pass tests.

 On Sat, May 30, 2015 at 3:23 PM, Suneel Marthi smar...@apache.org wrote:

  Verified {source} * {zip, tar} and all tests pass.
 
  +1 (binding)
 
  On Sat, May 30, 2015 at 5:28 PM, Suneel Marthi smar...@apache.org
 wrote:
 
   This is a call for VOTE to pass Mahout 0.10.1 release candidate that's
   available at
  
  
  
 
 https://repository.apache.org/content/repositories/orgapachemahout-1008/org/apache/mahout/mahout-distribution/0.10.1/
  
   Need atleast 3 PMC +1 (binding) votes to cut the release
  
   Below are the tasks breakdown for the PMC and committers:
  
   Andy Palumbo  Pat Ferrel: verify the binary artifacts and run tests
  
   Suneel  AKM:  verify the src artifacts
  
   Ted/Grant/Drew: verify the hashes and Sigs
  
   The LICENSE.txt and NOTICE.txt still need to be updated and will not be
   addressed as part of 0.10.1 release.
  
  
  
 



Re: [VOTE] Mahout 0.10.1 Release Candidate

2015-05-30 Thread Suneel Marthi
Here's the new 0.10.1 Release Candidate

https://repository.apache.org/content/repositories/orgapachemahout-1009/org/apache/mahout/apache-mahout-distribution/0.10.1/

The Voting ends on Sunday, May 31 2015.

Need a +1 from the PMC for each of the line items below for the release to
pass.

1. Ted/Grant:  Verify hashes and checksums - {binary,source} x {zip,tar} +
pom

2. AKM:  Verify examples on EMR  - {binary, source} * {zip, tar}

3. Andrew Palumbo: Verify examples locally - {binary} * {zip, tar}

4. Suneel: Verify build and tests - {source} * {zip, tar}

5. Pat:  Verify examples locally - {source} * {zip, tar}

The LICENSE and NOTICE files have not been updated this time and will be
addressed in future releases.



On Sat, May 30, 2015 at 8:32 PM, Suneel Marthi suneel.mar...@gmail.com
wrote:

 Please hold ur votes, will be refreshing staging with another build in the
 next hour

 On Sat, May 30, 2015 at 8:31 PM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:

 Likewise source zip and tarballs build and pass tests.

 On Sat, May 30, 2015 at 3:23 PM, Suneel Marthi smar...@apache.org
 wrote:

  Verified {source} * {zip, tar} and all tests pass.
 
  +1 (binding)
 
  On Sat, May 30, 2015 at 5:28 PM, Suneel Marthi smar...@apache.org
 wrote:
 
   This is a call for VOTE to pass Mahout 0.10.1 release candidate that's
   available at
  
  
  
 
 https://repository.apache.org/content/repositories/orgapachemahout-1008/org/apache/mahout/mahout-distribution/0.10.1/
  
   Need atleast 3 PMC +1 (binding) votes to cut the release
  
   Below are the tasks breakdown for the PMC and committers:
  
   Andy Palumbo  Pat Ferrel: verify the binary artifacts and run tests
  
   Suneel  AKM:  verify the src artifacts
  
   Ted/Grant/Drew: verify the hashes and Sigs
  
   The LICENSE.txt and NOTICE.txt still need to be updated and will not
 be
   addressed as part of 0.10.1 release.
  
  
  
 





Re: seq2sparse dropping tokens

2015-05-29 Thread Suneel Marthi
Allen, could u please file a JIRA for this?

On Fri, May 29, 2015 at 8:58 AM, Allen McIntosh amcint...@appcomsci.com
wrote:

 This shows up with Mahout 0.10.0 (the distribution archive) and Hadoop
 2.2.0

 When I run seq2sparse on a document containing the following tokens:

 cash cash equival cash cash equival consist highli liquid instrument
 commerci paper time deposit other monei market instrument which origin
 matur three month less aggreg cash balanc bank reclassifi neg balanc
 consist mainli unclear check account payabl neg balanc reclassifi
 account payabl decemb

 the tokens mainli, check and unclear are dropped on the floor (they do
 not appear in the dictionary file).  The issue persists if I change the
 analyzer to SimpleAnalyzer (-a
 org.apache.lucene.analysis.core.SimpleAnalyzer).  I can understand an
 English analyzer doing something like this, but it seems a little
 strange that it would happen with SimpleAnalyzer.  (I wonder if it is
 coincidence that these tokens appear consecutively in the input.)

 What I am trying to do:  The standard analyzers don't do enough, and I
 have no access to the client's cluster to preload a custom analyzer.
 Processing the text before stuffing it into the initial sequence file
 seemed to be the cleanest alternative, since there doesn't seem to be
 any way to add a custom jar when using a stock Mahout app.

 Why dropped or mangled tokens matter, other than as missing information:
  Ultimately what I need to do is calculate topic weights for an
 arbitrary chunk of text.  (See next post.)  If I can't get the tokens
 right, I don't think I can do this.






Re: Row Similarity

2015-05-14 Thread Suneel Marthi
There used to be an online page on mahout.apache.org that Pat Ferrel had
put together few years ago.
Not sure if its still around, Pat ???

If not, I can write up more detailed steps later today and send it ur way.

On Thu, May 14, 2015 at 2:18 PM, Jonathan Seale jonathanpse...@gmail.com
wrote:

 Thanks, guys. Can you recommend any resources that show an example of these
 steps? A google search returns very little information. Now I know what to
 do, but I can't find anything that tells me how to do it.


 On Wed, May 13, 2015 at 11:56 PM, Suneel Marthi smar...@apache.org
 wrote:

  Hi Jonathan,
 
  Here's what u gotta do to run RowSimilarity on ur CSV formatted data.
 You
  would have to use the MapReduce version since the Spark version only
  supports LLR.
 
  1. Convert CSV to Vectors - use CSVIterator and store the vectors as
  SequenceFiles
  2.  Run RowIDJob on the SequenceFile output of (1). This should generate
 a
  Matrix of IntWritable, VectorWriteable and a docIndex of IntWritable,
  Text
  3.  Run RowSimilarityjob on the matrix output from (2) specifiying
  CosineDistance and a cutoff threshold. This should generate a matrix of
  Rows - Most similar rows with distances.
 
 
 
 
  On Wed, May 13, 2015 at 11:42 PM, Jonathan Seale 
 jonathanpse...@gmail.com
  
  wrote:
 
   Thanks, Charlie,
  
   The data has been through lots of processing, but in an attempt to make
  it
   more Mahout-friendly, I've converted it into a single csv table with
   columns: star_id, wavelength, intensity. My motivation was to make it
  like
   a user_id, item_id, rating table you might see in other Mahout uses.
  
   As opposed to using my local machine, I've setup an instance on Amazon
  with
   hopes of turning this into a remote service. So the install is whatever
   comes with Amazon's default Mahout installation.
  
   Jonathan
  
  
  
   On Wed, May 13, 2015 at 11:29 PM, Charlie Hack 
 charles.t.h...@gmail.com
  
   wrote:
  
Hi Jonathan, how do you have the data stored? More info about your
  setup
the better.
   
   
Charlie
   
   
   
   
   
   
   
   
   
—
Sent from Mailbox
   
   
   
   
On Wednesday, May 13, 2015 at 23:16, Jonathan Seale 
jonathanpse...@gmail.com, wrote:
Scientists,
   
   
I have an astrophysical application for Mahout that I need help with.
   
   
I have 1-dimensional stellar spectra for many, many stars. Each
  spectrum
   
consists of a series of intensity values, one per wavelength of
 light.
  I
   
need to be able to find the cosine similarity between ALL pairs of
  stars.
   
Seems to me this is simply a user-user similarity problem where I
 have
   
stars instead of users, wavelengths instead of items, and intensities
   
instead of ratings/clicks.
   
   
But I'm having difficulty using mahout's row similarity package (I'm
  new
   to
   
this, and these days astronomers code pretty exclusively in python).
 I
   know
   
that I must have to 1) create a sparse matrix where each row is a
 star,
   
columns are wavelengths, and the values are intensity, and 2)
 implement
   row
   
similarity. But I'm just not sure how to do it. Anyone have a good
   resource
   
or be willing to help? I could probably offer some compensation to
  anyone
   
that would be willing to provide a little focussed, personalized
assistance.
   
   
Thanks,
   
Jonathan
   
  
 



Re: Row Similarity

2015-05-13 Thread Suneel Marthi
Hi Jonathan,

Here's what u gotta do to run RowSimilarity on ur CSV formatted data.  You
would have to use the MapReduce version since the Spark version only
supports LLR.

1. Convert CSV to Vectors - use CSVIterator and store the vectors as
SequenceFiles
2.  Run RowIDJob on the SequenceFile output of (1). This should generate a
Matrix of IntWritable, VectorWriteable and a docIndex of IntWritable,
Text
3.  Run RowSimilarityjob on the matrix output from (2) specifiying
CosineDistance and a cutoff threshold. This should generate a matrix of
Rows - Most similar rows with distances.




On Wed, May 13, 2015 at 11:42 PM, Jonathan Seale jonathanpse...@gmail.com
wrote:

 Thanks, Charlie,

 The data has been through lots of processing, but in an attempt to make it
 more Mahout-friendly, I've converted it into a single csv table with
 columns: star_id, wavelength, intensity. My motivation was to make it like
 a user_id, item_id, rating table you might see in other Mahout uses.

 As opposed to using my local machine, I've setup an instance on Amazon with
 hopes of turning this into a remote service. So the install is whatever
 comes with Amazon's default Mahout installation.

 Jonathan



 On Wed, May 13, 2015 at 11:29 PM, Charlie Hack charles.t.h...@gmail.com
 wrote:

  Hi Jonathan, how do you have the data stored? More info about your setup
  the better.
 
 
  Charlie
 
 
 
 
 
 
 
 
 
  —
  Sent from Mailbox
 
 
 
 
  On Wednesday, May 13, 2015 at 23:16, Jonathan Seale 
  jonathanpse...@gmail.com, wrote:
  Scientists,
 
 
  I have an astrophysical application for Mahout that I need help with.
 
 
  I have 1-dimensional stellar spectra for many, many stars. Each spectrum
 
  consists of a series of intensity values, one per wavelength of light. I
 
  need to be able to find the cosine similarity between ALL pairs of stars.
 
  Seems to me this is simply a user-user similarity problem where I have
 
  stars instead of users, wavelengths instead of items, and intensities
 
  instead of ratings/clicks.
 
 
  But I'm having difficulty using mahout's row similarity package (I'm new
 to
 
  this, and these days astronomers code pretty exclusively in python). I
 know
 
  that I must have to 1) create a sparse matrix where each row is a star,
 
  columns are wavelengths, and the values are intensity, and 2) implement
 row
 
  similarity. But I'm just not sure how to do it. Anyone have a good
 resource
 
  or be willing to help? I could probably offer some compensation to anyone
 
  that would be willing to provide a little focussed, personalized
  assistance.
 
 
  Thanks,
 
  Jonathan
 



Re: Replacement for DefaultAnalyzer

2015-05-09 Thread Suneel Marthi
Not sure how this was used in 0.7 (its  3 yrs legacy). But I am guessing
this would have been required for Lucene 3x back then and must have been
dropped for the Lucene 4x upgrade for 0.8 (circa late 2012).

On Fri, May 8, 2015 at 8:03 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Folks,
 I'm making an upgrade from Mahout 0.7 -- 0.9.
 I am experiencing the same problem as experienced in the following post
 [0].
 Can someone please suggest what I should replace DefaultAnalyzer with? I am
 aware that it was removed from the Mahout API in 0.8?
 In the meantime I am going to tst an implementation of Lucene's base
 implementation for the Lucene version matching Mahout 0.9.
 Thanks in advance to anyone who has the context here.
 Best
 Lewis

 [0] http://www.mail-archive.com/user%40mahout.apache.org/msg14344.html

 --
 *Lewis*



Re: Replacement for DefaultAnalyzer

2015-05-09 Thread Suneel Marthi
Mahout 0.9 and 0.10.0 are using Lucene 4.6.1. There's been a change in the
TokenStream workflow in Lucene post-Lucene 4.5.

What exactly are u trying to do and where is it u r stuck now? It would
help if u posted a code snippet or something.

On Sat, May 9, 2015 at 10:51 AM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Suneel,
 Yes this is true. It was dropped exactly due to the Lucene upgrade.
 I'm still working on understanding what to use as the underlying Analyzer
 interface from Lucene also dropped the Definition of reusableTokenStream
 method call!
 Is there any other advice to hand with what to suggest is suitable for
 making the upgrade.
 The project I am working on is using Hadoop 1.2.X and will not be upgrading
 for a while. mahout 0.9 would work perfectly well with this distro however
 upgrade is slightly so fusing right now based on the API being broken as
 oppose to deprecated.
 Thanks again for any help.
 Lewis

 On Saturday, May 9, 2015, Suneel Marthi smar...@apache.org wrote:

  Not sure how this was used in 0.7 (its  3 yrs legacy). But I am guessing
  this would have been required for Lucene 3x back then and must have been
  dropped for the Lucene 4x upgrade for 0.8 (circa late 2012).
 
  On Fri, May 8, 2015 at 8:03 PM, Lewis John Mcgibbney 
  lewis.mcgibb...@gmail.com javascript:; wrote:
 
   Hi Folks,
   I'm making an upgrade from Mahout 0.7 -- 0.9.
   I am experiencing the same problem as experienced in the following post
   [0].
   Can someone please suggest what I should replace DefaultAnalyzer with?
 I
  am
   aware that it was removed from the Mahout API in 0.8?
   In the meantime I am going to tst an implementation of Lucene's base
   implementation for the Lucene version matching Mahout 0.9.
   Thanks in advance to anyone who has the context here.
   Best
   Lewis
  
   [0] http://www.mail-archive.com/user%40mahout.apache.org/msg14344.html
  
   --
   *Lewis*
  
 


 --
 *Lewis*



Re: Spectral Clustering

2015-05-07 Thread Suneel Marthi
@ShannonQuinn ??

On Thu, May 7, 2015 at 1:45 PM, sugam bahl sugamb...@yahoo.co.in wrote:

 Hi Team,

 I am new to Mahout and working on a project where I need to cluster json
 documents. I went through the documentation but didn't get enough insights
 about this. Could you please help me on how the start the implementation in
 Java and how shall my input file looks like?

 Thanks,
 Sugam



Re: Spectral Clustering

2015-05-07 Thread Suneel Marthi
Shannon would be the right guy to answer this.

On Thu, May 7, 2015 at 1:52 PM, sugam bahl sugamb...@yahoo.co.in wrote:

 What do we mean by ShannonQuinn??

 Thanks,
 Sugam



 On Thursday, 7 May 2015 10:49 AM, Suneel Marthi smar...@apache.org
 wrote:
 @ShannonQuinn ??


 On Thu, May 7, 2015 at 1:45 PM, sugam bahl sugamb...@yahoo.co.in wrote:

  Hi Team,
 
  I am new to Mahout and working on a project where I need to cluster json
  documents. I went through the documentation but didn't get enough
 insights
  about this. Could you please help me on how the start the implementation
 in
  Java and how shall my input file looks like?
 
  Thanks,
  Sugam
 



Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread Suneel Marthi
What's the Mahout Version# u r running with?

On Tue, Apr 21, 2015 at 6:37 AM, mw m...@plista.com wrote:

 Hello,

 I am trying to get tfidf vectors from a corpus of 100k documents. I
 noticed that tfidf sequence file is empty, while the tf vectors are not.

 Here is the log from SparseVectorsFromSequenceFiles:

 INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Maximum
 n-gram size is: 1
 INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Minimum
 LLR value: 1.0
 INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Number
 of reduce tasks: 1
 INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles:
 Tokenizing documents in /opt/seq
 INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Creating
 Term Frequency Vectors
 INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles:
 Calculating IDF
 INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Pruning

 Here is the tfidf output dir:

 root@test:[/opt/sparse/tfidf-vectors] # ll
 total 20K
 drwxr-xr-x 2 tomcat7 tomcat7 4.0K Apr 21 12:27 .
 drwxr-xr-x 9 tomcat7 tomcat7 4.0K Apr 21 12:27 ..
 -rw-r--r-- 1 tomcat7 tomcat7   90 Apr 21 12:27 part-r-0
 -rw-r--r-- 1 tomcat7 tomcat7   12 Apr 21 12:27 .part-r-0.crc
 -rw-r--r-- 1 tomcat7 tomcat70 Apr 21 12:27 _SUCCESS
 -rw-r--r-- 1 tomcat7 tomcat78 Apr 21 12:27 ._SUCCESS.crc

 Here is the tf output dir:
 root@test:[/opt/sparse/tf-vectors] # ll
 total 3.7M
 drwxr-xr-x 2 tomcat7 tomcat7 4.0K Apr 21 12:27 .
 drwxr-xr-x 9 tomcat7 tomcat7 4.0K Apr 21 12:27 ..
 -rw-r--r-- 1 tomcat7 tomcat7 3.6M Apr 21 12:27 part-r-0
 -rw-r--r-- 1 tomcat7 tomcat7  29K Apr 21 12:27 .part-r-0.crc
 -rw-r--r-- 1 tomcat7 tomcat70 Apr 21 12:27 _SUCCESS
 -rw-r--r-- 1 tomcat7 tomcat78 Apr 21 12:27 ._SUCCESS.crc

 Here is the input dir:
 root@test:[/opt/seq] # ll
 total 81M
 drwxr-xr-x 2 tomcat7 tomcat7 4.0K Apr 21 12:25 .
 drwxrwxrwx 9 tomcat7 root4.0K Apr 21 12:25 ..
 -rw-r--r-- 1 tomcat7 tomcat7  31M Apr 21 12:25 part-m-0
 -rw-r--r-- 1 tomcat7 tomcat7 242K Apr 21 12:25 .part-m-0.crc
 -rw-r--r-- 1 tomcat7 tomcat7  31M Apr 21 12:25 part-m-1
 -rw-r--r-- 1 tomcat7 tomcat7 242K Apr 21 12:25 .part-m-1.crc
 -rw-r--r-- 1 tomcat7 tomcat7  20M Apr 21 12:25 part-m-2
 -rw-r--r-- 1 tomcat7 tomcat7 155K Apr 21 12:25 .part-m-2.crc
 -rw-r--r-- 1 tomcat7 tomcat70 Apr 21 12:25 _SUCCESS
 -rw-r--r-- 1 tomcat7 tomcat78 Apr 21 12:25 ._SUCCESS.crc


 I am running it using the toolrunner with the following parameters:
 -i /opt/seq -o /opt/sparse/ -nv --maxDFSigma 2.0 --weight tfidf

 Any hints why it might be failing?

 Best,
 Max




Apache Mahout 0.10.0 Released

2015-04-12 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.10.0.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms. This release has some
major changes from 0.9, including the new Apache Spark back-end (with H2O
in progress), a new matrix math DSL, streamlined content and bug fixes.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs most fully on
Spark.


To get started with Apache Mahout 0.10.0, download the release artifacts
and signatures from http://www.apache.org/dist/mahout/0.10.0/


Many thanks to the contributors and committers who were part of this
release. Please see below for the Release Highlights.


RELEASE HIGHLIGHTS

Mahout-Samsara has implementations for these generalized concepts:

   -

   Linear algebra operations, multiply, transpose, slice, row and column
   iterators
   -

   Distributed BLAS optimizer
   -

   R-Like operators; for example A.t %*% A, which performs an optimized
   ‘thin’ A’A
   -

   Packaged as extensions to Scala
   -

   Includes a Scala REPL based interactive shell that runs on Spark
   -

   Integrates with compatible libraries like MLLib


Mahout has historically been about highly scalable algorithms, and though
we continue to support many of the past Hadoop MapReduce implementations
(now with full Hadoop 2 support), Mahout also comes with the some new
Mahout-Samsara based implementations:

   -

   Distributed and in-core: Stochastic Singular Value Decomposition (SSVD)
   -

   Distributed Principal Component Analysis (PCA)
   -

   Distributed and in-core QR Reduction (QR)
   -

   Distributed Alternating Least Squares decomposition (ALS)
   -

   Collaborative Filtering: Item and Row Similarity based on cooccurrence
   and supporting multimodal user actions
   -

   Naive Bayes Classification


RELATION TO MACHINE LEARNING LIBS

Since Mahout is positioned as an environment it also allows seamless use of
libraries like Mllib. If you need scalable linear algebra, think Mahout, if
you need a specific algorithm check any compatible library as well.

STATS

A total of 205 separate JIRA issues are addressed in this release [2]. with
65 bugfixes.

GETTING STARTED

Download the release artifacts and signatures at
https://mahout.apache.org/general/downloads.html The examples directory
contains several working examples of the core functionality available in
Mahout. These can be run via scripts in the examples/bin directory. Most
examples do not need a Hadoop cluster in order to run.

FUTURE PLANS

0.10.1

As the project moves towards a 0.10.1 release, we are working on the
following:


   -

   Implement an end-to-end pipeline for an itemsimilarity recommender
   workflow on top of H2O.
   -

   Implement a more robust text processing pipeline
   -

   Incorporate more statistical operations
   -

   Support Spark DataFrames


Post 0.10.1

We already see the need for work in these areas:


   -

   Mahout algebra performance improvements and bug fixes
   -

   Streaming data
   -

   Visualization
   -

   Fuller H2O support
   -

   Apache Flink support
   -

   In-core matrix performance optimization


CONTRIBUTING


If you are interested in contributing, please see our How to Contribute
http://mahout.apache.org/developers/how-to-contribute.html[3] page or
contact us via email at d...@mahout.apache.org.

CREDITS

As with any release, we wish to thank all of the users and contributors to
Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
individual credits, as there are too many to list here.

[1] https://github.com/apache/mahout/blob/master/CHANGELOG
[2]
https://issues.apache.org/jira/browse/MAHOUT-1678?jql=project%20%3D%20MAHOUT%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20in%20%280.10.0%2C%201.0%29

[3] http://mahout.apache.org/developers/how-to-contribute.html


Re: [VOTE] Apache Mahout 0.10.0 Release

2015-04-11 Thread Suneel Marthi
Thanks everyone. We have had 5  +1 votes from the PMC and this release has
passed and the Voting officially closes.
Will send a formal release announcement once the release is finalized.

Thanks again.

On Sat, Apr 11, 2015 at 12:20 PM, Pat Ferrel p...@occamsmachete.com wrote:

 Just built an external app using sbt against the staging repo and it looks
 good to me

 +1 (binding)

 On Apr 11, 2015, at 9:12 AM, Andrew Palumbo ap@outlook.com wrote:

 After testing examples locally from .tar and .zip distribution and testing
 the staged mahout-math artifact in a java application, I am happy with this
 release.

 +1 (binding)
 On 04/11/2015 11:45 AM, Suneel Marthi wrote:
  After checking the {source} * {tar,zip} and running a few tests locally,
 I
  am fine with this release.
 
  +1 (binding)
 
  On Sat, Apr 11, 2015 at 11:43 AM, Andrew Musselman 
  andrew.mussel...@gmail.com wrote:
 
  After checking the binary tarball and zip, and running through all the
  examples on an EMR cluster, I am good with this release.
 
  +1 (binding)
 
  On Fri, Apr 10, 2015 at 9:34 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
  Ah... forgot this.
 
  +1 (binding)
 
  On Fri, Apr 10, 2015 at 11:14 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
  I downloaded and tested the signatures and check-sums on
  {binary,source}
  x
  {zip,tar} + pom.  All were correct.
 
  One thing that I worry a little about is that the name of the artifact
  doesn't include apache.  Not sure that is a hard requirement, but it
  seems a good thing to do.
 
 
 
  On Fri, Apr 10, 2015 at 8:16 PM, Suneel Marthi 
  suneel.mar...@gmail.com
  wrote:
 
  Here's a new Mahout 0.10.0 Release Candidate at
 
 
 
 https://repository.apache.org/content/repositories/orgapachemahout-1007/
  The Voting for this ends on tomorrow.  Need atleast 3 PMC +1 for the
  release to pass.
 
  Grant, Ted:  Would appreciate if u guys could verify the signatures.
 
 
  Rest: Please test the artifacts.
 
  Thanks to all the contributors and committers.
 
  Regards,
  Suneel
 
  On Fri, Apr 10, 2015 at 11:45 AM, Pat Ferrel p...@occamsmachete.com
  wrote:
 
  Ran well but we have a packaging problem with the binary distro.
  Will
  require either a pom or code change I think, hold the vote.
 
 
 
  On Apr 9, 2015, at 4:31 PM, Andrew Musselman 
  andrew.mussel...@gmail.com
  wrote:
 
  Running on EMR now.
 
  On Thu, Apr 9, 2015 at 3:52 PM, Pat Ferrel p...@occamsmachete.com
  wrote:
  I can't run it (due to messed up dev machine) but I verified the
  artifacts
  buildiing an external app with sbt using the staged repo instead
  of
  my
  local .m2 cache. This means the Scala classes were resolved
  correctly
  from
  the artifacts.
 
  Hope someone can actually run it on a cluster
 
 
  On Apr 9, 2015, at 2:42 PM, Suneel Marthi 
  suneel.mar...@gmail.com
  wrote:
  Please find the Mahout 0.10.0 release candidate at
 
 
 https://repository.apache.org/content/repositories/orgapachemahout-1005/
  The Voting runs till Saturday, April 11 2015, need atleast 3 PMC
  +1
  votes
  for the candidate release to pass.
 
  Thanks again to all the commiters and contributors for their hard
  work
  over
  the past few weeks.
 
  Regards,
  Suneel
  On Behalf of Apache Mahout Team
 
 
 
 





Re: [VOTE] Apache Mahout 0.10.0 Release

2015-04-10 Thread Suneel Marthi
Checking the past releases, the artifacts names have always been
'mahout-distribution-*'.  We could change that to
'apache-mahout-distribution-*' for future releases.

On Sat, Apr 11, 2015 at 12:14 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 I downloaded and tested the signatures and check-sums on {binary,source} x
 {zip,tar} + pom.  All were correct.

 One thing that I worry a little about is that the name of the artifact
 doesn't include apache.  Not sure that is a hard requirement, but it
 seems a good thing to do.



 On Fri, Apr 10, 2015 at 8:16 PM, Suneel Marthi suneel.mar...@gmail.com
 wrote:

  Here's a new Mahout 0.10.0 Release Candidate at
 
  https://repository.apache.org/content/repositories/orgapachemahout-1007/
 
  The Voting for this ends on tomorrow.  Need atleast 3 PMC +1 for the
  release to pass.
 
  Grant, Ted:  Would appreciate if u guys could verify the signatures.
 
 
  Rest: Please test the artifacts.
 
  Thanks to all the contributors and committers.
 
  Regards,
  Suneel
 
  On Fri, Apr 10, 2015 at 11:45 AM, Pat Ferrel p...@occamsmachete.com
  wrote:
 
   Ran well but we have a packaging problem with the binary distro. Will
   require either a pom or code change I think, hold the vote.
  
  
  
   On Apr 9, 2015, at 4:31 PM, Andrew Musselman 
 andrew.mussel...@gmail.com
  
   wrote:
  
   Running on EMR now.
  
   On Thu, Apr 9, 2015 at 3:52 PM, Pat Ferrel p...@occamsmachete.com
  wrote:
  
I can't run it (due to messed up dev machine) but I verified the
   artifacts
buildiing an external app with sbt using the staged repo instead of
 my
local .m2 cache. This means the Scala classes were resolved correctly
   from
the artifacts.
   
Hope someone can actually run it on a cluster
   
   
On Apr 9, 2015, at 2:42 PM, Suneel Marthi suneel.mar...@gmail.com
   wrote:
   
Please find the Mahout 0.10.0 release candidate at
   
  https://repository.apache.org/content/repositories/orgapachemahout-1005/
   
The Voting runs till Saturday, April 11 2015, need atleast 3 PMC +1
  votes
for the candidate release to pass.
   
Thanks again to all the commiters and contributors for their hard
 work
   over
the past few weeks.
   
Regards,
Suneel
On Behalf of Apache Mahout Team
   
   
  
  
 



Re: [VOTE] Apache Mahout 0.10.0 Release

2015-04-10 Thread Suneel Marthi
Here's a new Mahout 0.10.0 Release Candidate at

https://repository.apache.org/content/repositories/orgapachemahout-1007/

The Voting for this ends on tomorrow.  Need atleast 3 PMC +1 for the
release to pass.

Grant, Ted:  Would appreciate if u guys could verify the signatures.


Rest: Please test the artifacts.

Thanks to all the contributors and committers.

Regards,
Suneel

On Fri, Apr 10, 2015 at 11:45 AM, Pat Ferrel p...@occamsmachete.com wrote:

 Ran well but we have a packaging problem with the binary distro. Will
 require either a pom or code change I think, hold the vote.



 On Apr 9, 2015, at 4:31 PM, Andrew Musselman andrew.mussel...@gmail.com
 wrote:

 Running on EMR now.

 On Thu, Apr 9, 2015 at 3:52 PM, Pat Ferrel p...@occamsmachete.com wrote:

  I can't run it (due to messed up dev machine) but I verified the
 artifacts
  buildiing an external app with sbt using the staged repo instead of my
  local .m2 cache. This means the Scala classes were resolved correctly
 from
  the artifacts.
 
  Hope someone can actually run it on a cluster
 
 
  On Apr 9, 2015, at 2:42 PM, Suneel Marthi suneel.mar...@gmail.com
 wrote:
 
  Please find the Mahout 0.10.0 release candidate at
  https://repository.apache.org/content/repositories/orgapachemahout-1005/
 
  The Voting runs till Saturday, April 11 2015, need atleast 3 PMC +1 votes
  for the candidate release to pass.
 
  Thanks again to all the commiters and contributors for their hard work
 over
  the past few weeks.
 
  Regards,
  Suneel
  On Behalf of Apache Mahout Team
 
 




Re: Error running HMM model

2015-04-07 Thread Suneel Marthi
From $MAHOUT_HOME try running ./bin/mahout
and see if that works.



On Wed, Apr 8, 2015 at 1:22 AM, Raghuveer alwaysra...@yahoo.com.invalid
wrote:


 I am learning mahout usage and as suggested here am trying to run my
 sample but i get the below error, kindly suggestError: Could not find or
 load main class ..mahout
 Note: I have set MAHOUT_HOME to trunk and $PATH has $MAHOUT_HOME/bin in
 ~/.bashrc.Also am unable to run mahout from the installation directory
 also, i see same error and the command am running is:java -Xmx1024M
 ./mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000




Re: Error running HMM model

2015-04-07 Thread Suneel Marthi
Could u  post the original issue more clearly formatted? its hard to
discern from your earlier post as to what is wrong

seems like an installation issue on ur end.

On Wed, Apr 8, 2015 at 1:37 AM, Raghuveer alwaysra...@yahoo.com wrote:

 Same no change except

 Error: Could not find or load main class ..bin.mahout



   On Wednesday, April 8, 2015 10:55 AM, Suneel Marthi 
 suneel.mar...@gmail.com wrote:


 From $MAHOUT_HOME try running ./bin/mahout
 and see if that works.



 On Wed, Apr 8, 2015 at 1:22 AM, Raghuveer alwaysra...@yahoo.com.invalid
 wrote:


 I am learning mahout usage and as suggested here am trying to run my
 sample but i get the below error, kindly suggestError: Could not find or
 load main class ..mahout
 Note: I have set MAHOUT_HOME to trunk and $PATH has $MAHOUT_HOME/bin in
 ~/.bashrc.Also am unable to run mahout from the installation directory
 also, i see same error and the command am running is:java -Xmx1024M
 ./mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000







Re: fast performance way of writing preferences to file?

2015-04-06 Thread Suneel Marthi
FYI,  adding to Pat's reply below Slope-One has been long deprecated.

On Mon, Apr 6, 2015 at 5:00 PM, Pat Ferrel p...@occamsmachete.com wrote:

 Sorry, we are trying to get a release out.

 You can look at a custom similarity measure. Look at where
 SIMILARITY_COSINE leads you and customize that maybe? There are in-memory
 and mapreduce versions and not sure which you are using. That is code I
 haven’t looked at for a long time so can’t get you much closer.


 On Apr 3, 2015, at 10:52 AM, PierLorenzo Bianchini
 piell...@yahoo.com.INVALID wrote:

 Hi again,
 seeing the answers to this question and the other I had posted (adjusted
 cosine similarity for item-based recommender?), I think I should clarify a
 bit what I'm trying to achieve and why I (believe I should) do things the
 way I'm doing.

 I'm doing a class called Learning from User-Generated data. Our first
 assignment deals with analysing the results of various types of
 recommenders. I'll go as far as saying old-school recommenders, given the
 content of your answers.
 We have been introduced to:
 * Memory based:
 - user-based
 - item-based (*with* adjusted cosine similarity!)
 - slope-one
 - graph-based transitivity
 * Memory based
 - preprocessed item/user based (? this is unclear to me but I didn't
 reach this part of the assignment so I'll search for information before I
 ask questions; I also found an article where they mentioned slope-one
 amongst the model based; I guess I'll need to do more research on this)
 - matrix factorization-based (I saw that SVD is available in Mahout;
 my project partner is looking into that right now)

 We have a *static* training dataset (800.000 user,movie,preference
 triples) and another static dataset for which we have to extract the
 predicted preferences (200.000 user,movie tuples) and write them back to
 a movie (i.e. recompose the user,movie,preference triples). Note that
 this will never go in a production environment, as it is merely a
 university requirement. For the same reason, I would prefer not to mix up
 things too much and I'd rather do a step-by-step learning (i.e. focus on
 Mahout for now, before I dig deeper and check the search-based approach,
 which uses DB-mahout-solr-spark... maybe a bit too much to handle at once
 with the deadline we were given).

 So if I might get back to my original questions (again, I'm sorry for
 being stubborn but I'm under specific constraints - I'll really try to
 understand the search-based approach when I have more time) ;)
 1. I'm guessing that to implement an adjusted cosine similarity I should
 extend AbstractSimilarity (or maybe even AbstractRecommender?). Is this
 right?
 2. I still can't believe that it takes more than at-most a few minutes to
 go through my 200.000 lines and find the already calculated preference.
 What am I doing wrong? :/ Should I store my whole datamodel in a file
 (how?) and then read through the file? I don't see how this could be faster
 than just reading the exact value I'm searching for...

 Thanks again for your answers! Regards,

 Pier Lorenzo


 
 On Fri, 4/3/15, Ted Dunning ted.dunn...@gmail.com wrote:

 Subject: Re: fast performance way of writing preferences to file?
 To: user@mahout.apache.org user@mahout.apache.org
 Date: Friday, April 3, 2015, 5:52 PM

 Are you sure that the
 problem is writing the results?  It seems to me that
 the real problem is the use of a user-based
 recommender.

 For such a
 small data set, for instance, a search-based recommender
 will be
 able to make recommendations in less
 than a millisecond with multiple
 recommendations possible in parallel.  This
 should allow you to do 200,000
 recommendations in a few minutes on a single
 machine.

 With such a small
 dataset, indicator-based methods may not be the best
 option.  To improve that, try using something
 larger such as the million
 song dataset.
 See http://labrosa.ee.columbia.edu/millionsong/

 Also, using and estimating
 ratings is not a particularly good thing to be
 doing if you want to build a real
 recommender.


 On
 Fri, Apr 3, 2015 at 3:26 AM, PierLorenzo Bianchini 
 piell...@yahoo.com.invalid
 wrote:

  Hello
 everyone,
  I'm new to mahout, to
 recommender systems and to the mailing list.
 
  I''m trying
 to find a (fast) way to write back preferences to a file.
 I
  tried a few methods but I'm sure
 there must be a better approach.
 
 Here's the deal (you can find the same post in
 stackoverflow[1]).
  I have a training
 dataset of 800.000 records from 6000 users rating 3900
  movies. These are stored in a comma
 separated file like:
 
 userId,movieId,preference. I have another dataset (200.000
 records) in the
  format: userId,movieId.
 My goal is to use the first dataset as a
  training-set, in order to determine the
 missing preferences of the second
 
 set.
 
  So far, I
 managed to load the training dataset and I generated
 user-based
  recommendations. 

Re: How to change /tmp directory for mahout usage of map-reduce?

2015-04-01 Thread Suneel Marthi
If u running Spectral KMeans via Command Line, u should be able to set the
parameter -tempDir to point to a different path

On Wed, Apr 1, 2015 at 1:55 AM, Andrew Musselman andrew.mussel...@gmail.com
 wrote:

 Can you let us know which code/scripts you're using?

 On Tuesday, March 31, 2015, Vikas Kumar kumar...@umn.edu wrote:

  Hello,
 
  I am using Mahout Spectral clustering example which internally calls a
 map
  reduce job. Right now, it is using */tmp/hadoop-username/mapred/..*
  directory by default for its operations.
 
  Can someone please let me know how to make mahout to use a different
 path?
 
  Thanks
  Vikas
 



Re: How to change /tmp directory for mahout usage of map-reduce?

2015-04-01 Thread Suneel Marthi
You need to set the temp path in ur Configuration and pass the
Configuration object to the subsequent calls.

IIRC, Spectral KMeans internally calls other MapReduce jobs like
MatrixDiagnolizeJob, VectorMatrixMultiplicationJob, SSVD.
So ensure that you are passing common parameters like tempDir, outputDir
etc via Configuration across the jobs.

Shannon could help better here.

On Wed, Apr 1, 2015 at 3:21 AM, Vikas Kumar kumar...@umn.edu wrote:

 Sorry, it didn't solved the problem.

 What it changed was the *tmp* directory for the following (taken from the
 log attached above):
 15/04/01 01:18:13 INFO mapred.MapTask: Processing split:
 file:/export/scratch/vikas/PRIVATE DIRECTORIES
 /tmp/calculations/seqfile/part-r-0:0+86000

 However, the *tmp* directory for TrackerDistributedCacheManager is still
 the same:

 15/04/01 01:18:13 INFO filecache.TrackerDistributedCacheManager: Creating
 vector in */tmp/hadoop-vikas/mapred/local*/archive/-623590149816891030_-
 1428839080_1939951392/file/export/scratch/vikas/PRIVATE
 DIRECTORIES/tmp/calculations-work--3390146237769593830 with rwxr-xr-x

 It seems like I just require to set the right resource (Path or string) in
 the Configuration object passed as the parameter of the Spectral
 Clustering. But not able to figure out which one.

 Thanks






 On Wed, Apr 1, 2015 at 1:43 AM, Vikas Kumar kumar...@umn.edu wrote:

  That was helpful to figure out what was required.
  I had to set the right path for variable *tmp* in the function from :
 
  Path tmp = new Path(tmp)
 
  to
 
  Path tmp = new Path(CHOSEN DIRECTORY);
 
  Silly mistake. Thanks for the clue :)
 
  -Vikas
 
 
 
 
 
 
 
  On Wed, Apr 1, 2015 at 1:34 AM, Suneel Marthi suneel.mar...@gmail.com
  wrote:
 
  If u running Spectral KMeans via Command Line, u should be able to set
 the
  parameter -tempDir to point to a different path
 
  On Wed, Apr 1, 2015 at 1:55 AM, Andrew Musselman 
  andrew.mussel...@gmail.com
   wrote:
 
   Can you let us know which code/scripts you're using?
  
   On Tuesday, March 31, 2015, Vikas Kumar kumar...@umn.edu wrote:
  
Hello,
   
I am using Mahout Spectral clustering example which internally
 calls a
   map
reduce job. Right now, it is using
 */tmp/hadoop-username/mapred/..*
directory by default for its operations.
   
Can someone please let me know how to make mahout to use a different
   path?
   
Thanks
Vikas
   
  
 
 
 



Re: Text clustering with SVD

2015-03-30 Thread Suneel Marthi
Here are the steps if u r using Mahout-mrlegacy in the present Mahout trunk:

1. Generate tfidf vectors from the input corpus using seq2sparse (I am
assuming you had done this before and hence avoiding the details)

2. Run SSVD on the generated tfidf vectors from (1)

  ./bin/mahout ssvd -i tfidf vectors -o svd output -k 80 -pca true
-us true -U false -V false

 k = no. of reduced basis vectors

You would need the U*Sigma output of the PCA flow for the next
clustering step

3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as
input.


On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan prince.don...@googlemail.com
wrote:

 Hallo Mahout users,

 I'm working on text clustering, I would like to reduce the features to
 enhance the clustering process.
 I would like to use  the Singular Value Decomposition before cluatering
 process. I will be thankfull if anyone has used this before, Is it a good
 idea for clustering?
 Is there any other method in mahout to reduce the text features before
 clustring?
 Is anyone has idea how can I apply SVD by using Java code?

 Thanks in advance,
 Donni



  1   2   3   4   5   >