LDA will be available in Spark 1.3.0 which should be released in a few days. (According to the Spark mailing list.)
https://issues.apache.org/jira/browse/SPARK-1405 This looks to be a long list of potential improvements coming along in time: https://issues.apache.org/jira/browse/SPARK-5572 What sort of hardware do you have available for your work? On Thu, Mar 12, 2015 at 12:39 PM, David Starina <[email protected]> wrote: > Thank you for your suggestions, I am also considering Spark. Actually I > was hoping I will be able to compare the speed of the Mahout's (MapReduce) > and MLLib's (Spark) implementations of LDA algorithm, but am not sure > whether the MLLib's implementation is already available in the current > version. I hope I will at least be able to try one of the implementations. > Anyway, don't want to spam your developer mailing list with this :-) > > --David > > On Thu, Mar 12, 2015 at 6:21 PM, Konstantin Boudnik <[email protected]> > wrote: > >> And speaking from my former academic background - it never hurts if you >> thesis >> is sexy. And Spark is quite hot at the moment ;) >> >> Cos >> >> On Thu, Mar 12, 2015 at 01:15PM, jay vyas wrote: >> > @David i like rj's idea on considering mllib, which is something which >> is >> > gauranteed to be bigtop supported ! possibly consider that as an >> option if >> > you want to build your thesis on bigtop >> > >> > On Thu, Mar 12, 2015 at 12:52 PM, Konstantin Boudnik <[email protected]> >> wrote: >> > >> > > On Thu, Mar 12, 2015 at 06:04AM, jay vyas wrote: >> > > > Hi david ! >> > > > >> > > > We found that mahout 0.9 , iirc, was released incompatible with >> Yarn at >> > > the >> > > > time, and there wasn't any commandline option that you could run >> when >> > > > compiling which fixed that issue. So that really made us realize we >> > > needed >> > > > community to participate with us. >> > > > >> > > > 1) I've reached out to the mahout community, and maybe they will >> join >> > > > forces with us before it is dropped, but for us, we simply have too >> many >> > > > other priorities and nobody from the mahout community was >> interested in >> > > > collaborating with us on package testing in bigtop... So much like >> > > fedora, >> > > > debian, and so , once the curators of the have no interest in >> packaging >> > > it, >> > > > it becomes hard to keep in the distro. >> > > > >> > > > 2) Are you interested in maintaining mahout packaging in bigtop? >> That >> > > > might be a nice addition to your thesis . It also would give you >> some >> > > > interesting insight into the libraries that mahout uses, and how it >> uses >> > > > hadoop APIs, etc... I'd be able to help you get up to speed with the >> > > basics >> > > > of building bigtop if you have that interest. >> > > > >> > > > 3) RE: W/o bigtop, you can always build/compile/install mahout from >> > > source >> > > > or from tarballs if need be. however this tends to be an annoying >> thing >> > > to >> > > > maintain and manually make sure it interoperates with your yarn >> distro >> > > > etc.... >> > > >> > > Not to say that the same compatiblity issue between Hadoop 2.x and >> Mahout >> > > will >> > > still be there when you build it yourself. >> > > >> > > Cos >> > > >> > > > On Thu, Mar 12, 2015 at 1:57 AM, David Starina < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Hi guys, >> > > > > >> > > > > I'm just an observer, a passer-by you might say (for now) of this >> > > mailing >> > > > > list, so I hope you won't mind me commenting on this. I was >> planning >> > > to use >> > > > > Hadoop with Mahout in my thesis, so this thread kind of freaked >> me out. >> > > > > Since you are mentioning the two pieces of software are >> incompatible - >> > > does >> > > > > that mean it is not possible to get them to work together, or just >> > > that it >> > > > > requires some extra effort? Also, there are some algorithms that >> work >> > > with >> > > > > Spark - do you know whether those still work with recent versions >> of >> > > Spark? >> > > > > Is there a lot of work to manually install Mahout without Bigtop? >> > > > > >> > > > > Anyhow, hope the Mahout guys find their focus again. >> > > > > >> > > > > Best regards, >> > > > > David >> > > > > >> > > > > >> > > > > On Thursday, March 12, 2015, jay vyas < >> [email protected]> >> > > wrote: >> > > > > >> > > > >> okay, lets drop it... Im fine with that. >> > > > >> >> > > > >> On Wed, Mar 11, 2015 at 7:49 PM, Konstantin Boudnik < >> [email protected]> >> > > > >> wrote: >> > > > >> >> > > > >>> But the last time, back in 0.8, we found that runtime is pretty >> > > broken. >> > > > >>> So, is >> > > > >>> there any real reason to keep on pushing an incompatible piece >> of >> > > > >>> software? >> > > > >>> >> > > > >>> Cos >> > > > >>> >> > > > >>> On Tue, Mar 10, 2015 at 09:42AM, jay vyas wrote: >> > > > >>> > At this point we can just keep packaging as is, but if >> bugs crop >> > > > >>> up, drop >> > > > >>> > it unless we can get help >> > > > >>> > On Mon, Mar 9, 2015 at 11:49 PM, Konstantin Boudnik < >> > > [email protected] >> > > > >>> > >> > > > >>> > wrote: >> > > > >>> > >> > > > >>> > Should read >> > > > >>> > >> > > > >>> > So, anyone is interested to maintain Mahout OR a thing of >> > > similar >> > > > >>> > nature? >> > > > >>> > >> > > > >>> > Sorry >> > > > >>> > On Mon, Mar 09, 2015 at 08:45PM, Konstantin Boudnik >> wrote: >> > > > >>> > > So, anyone is interested to maintain Mahout and a >> thing of >> > > > >>> similar >> > > > >>> > nature? >> > > > >>> > >> > > > >>> > > >> > > > >>> > > Cos >> > > > >>> > > >> > > > >>> > > On Sat, Mar 07, 2015 at 02:13AM, Konstantin Boudnik >> wrote: >> > > > >>> > > > I think it eventually boils down to who will be >> > > maintaining >> > > > >>> the >> > > > >>> > component. >> > > > >>> > > > >> > > > >>> > > > As Jay said - there's maintainer for the component >> and if >> > > it >> > > > >>> will >> > > > >>> > continue >> > > > >>> > > > like this we might have no choice but delete it: I >> think >> > > > >>> right now >> > > > >>> > it blocks >> > > > >>> > > > the release. >> > > > >>> > > > >> > > > >>> > > > Cos >> > > > >>> > > > >> > > > >>> > > > On Fri, Mar 06, 2015 at 02:29PM, Ed - 0x1b wrote: >> > > > >>> > > > > some links to some of Mahout's replacements - not >> all >> > > Apache >> > > > >>> > projects. >> > > > >>> > > > > >> > > > >>> > > > > >> > > > >>> > >> > > > >>> >> > > >> https://gigaom.com/2014/03/27/apache-mahout-hadoops-original-machine-learning-project-is-moving-on-from-mapreduce/ >> > > > >>> > > > > http://0xdata.com/ >> > > > >>> > > > > https://spark.apache.org/mllib/ >> > > > >>> > > > > >> > > > >>> > >> > > > >>> >> > > https://databricks.com/blog/2014/06/30/sparkling-water-h20-spark.html >> > > > >>> > > > > https://github.com/apache/mahout/tree/master/h2o >> > > > >>> > > > > >> > > > >>> > > > > and >> > > > >>> > > > > >> > > > >>> > > > > >> > > > >>> > >> > > > >>> >> > > >> https://gigaom.com/2014/02/28/cloudera-is-rebuilding-machine-learning-for-hadoop-with-oryx/ >> > > > >>> > > > > >> > > > >>> > > > > On Fri, Mar 6, 2015 at 12:47 PM, Konstantin Boudnik >> > > > >>> > <[email protected]> wrote: >> > > > >>> > > > > > Thanks man! I've heard that there's a new >> project that >> > > > >>> picks up >> > > > >>> > where Mahout >> > > > >>> > > > > > left of wrt Hadoop2.x support. But might be I am >> just >> > > > >>> delusional >> > > > >>> > from >> > > > >>> > > > > > hunger...? >> > > > >>> > > > > > >> > > > >>> > > > > > On Fri, Mar 06, 2015 at 02:32PM, jay vyas wrote: >> > > > >>> > > > > >>A A i sent a email to mahout-dev... maybe >> someone >> > > will >> > > > >>> ping >> > > > >>> > back :) >> > > > >>> > > > > >>A A On Fri, Mar 6, 2015 at 2:25 PM, Jay Vyas >> > > > >>> > <[email protected]> >> > > > >>> > > > > >>A A wrote: >> > > > >>> > > > > >> >> > > > >>> > > > > >>A A A Iirc we don't have any maintainers for >> it. >> > > > >>> > > > > >>A A A Is anyone interested in maintaining it? >> > > > >>> > > > > >>A A A > On Mar 6, 2015, at 2:23 PM, Konstantin >> > > Boudnik >> > > > >>> > <[email protected]> wrote: >> > > > >>> > > > > >>A A A > >> > > > >>> > > > > >>A A A > Does anyone know what's the story with >> > > Mahout? >> > > > >>> Has it >> > > > >>> > been fixed to be >> > > > >>> > > > > >>A A A working >> > > > >>> > > > > >>A A A > with Hadoop2 or shall we remove it >> from the >> > > > >>> BOM? >> > > > >>> > > > > >>A A A > >> > > > >>> > > > > >>A A A > Cos >> > > > >>> > > > > >>A A A > >> > > > >>> > > > > >>A A A >> On Sat, Feb 28, 2015 at 06:56PM, >> > > Konstantin >> > > > >>> Boudnik >> > > > >>> > wrote: >> > > > >>> > > > > >>A A A >> Guys, >> > > > >>> > > > > >>A A A >> >> > > > >>> > > > > >>A A A >> It'd be great if we can have the next >> > > release >> > > > >>> ready >> > > > >>> > by ApacheCon in >> > > > >>> > > > > >>A A A April. >> > > > >>> > > > > >>A A A >> Think about all the PR and publicity >> we >> > > can >> > > > >>> get >> > > > >>> > without any effort on >> > > > >>> > > > > >>A A A our own. >> > > > >>> > > > > >>A A A >> And perhaps from the tactical >> standpoint >> > > we >> > > > >>> shall >> > > > >>> > call this release >> > > > >>> > > > > >>A A A 1.0? >> > > > >>> > > > > >>A A A >> >> > > > >>> > > > > >>A A A >> I believe the only major hurdle >> between us >> > > > >>> and the >> > > > >>> > release is CI. >> > > > >>> > > > > >>A A A Roman, I >> > > > >>> > > > > >>A A A >> understand you're busy elsewhere, but >> > > could >> > > > >>> you >> > > > >>> > please let us know >> > > > >>> > > > > >>A A A what else >> > > > >>> > > > > >>A A A >> needs to be done before we can start >> > > doing the >> > > > >>> > regular builds and how >> > > > >>> > > > > >>A A A the >> > > > >>> > > > > >>A A A >> community can help. That's the >> highest >> > > > >>> priority, >> > > > >>> > IMO. >> > > > >>> > > > > >>A A A >> >> > > > >>> > > > > >>A A A >> There a couple of the tickets left >> > > > >>> > unfixed/unassigned on BIGTOP-1480, >> > > > >>> > > > > >>A A A and if >> > > > >>> > > > > >>A A A >> they aren't resolved on time we can >> move >> > > them >> > > > >>> > farther. There's lesser >> > > > >>> > > > > >>A A A than a >> > > > >>> > > > > >>A A A >> half-dozen blockers and none of them >> look >> > > too >> > > > >>> big, >> > > > >>> > honestly. And we >> > > > >>> > > > > >>A A A have a >> > > > >>> > > > > >>A A A >> whole lot of active committers and >> > > > >>> contributors to >> > > > >>> > wrap-up the >> > > > >>> > > > > >>A A A release in a >> > > > >>> > > > > >>A A A >> couple of weeks. >> > > > >>> > > > > >>A A A >> >> > > > >>> > > > > >>A A A >> Do we want to try upgrade to HBase >> 1.x for >> > > > >>> this >> > > > >>> > release or it might >> > > > >>> > > > > >>A A A be too big >> > > > >>> > > > > >>A A A >> of a distortion? Andrew, what do you >> think >> > > > >>> and do >> > > > >>> > you have cycles to >> > > > >>> > > > > >>A A A do that? >> > > > >>> > > > > >>A A A >> >> > > > >>> > > > > >>A A A >> What else we need to get done for >> this >> > > > >>> release? >> > > > >>> > Suggestions? >> > > > >>> > > > > >>A A A >> >> > > > >>> > > > > >>A A A >> Is there anyone who wants to step up >> as >> > > the >> > > > >>> RM this >> > > > >>> > time around? RM >> > > > >>> > > > > >>A A A doesn't >> > > > >>> > > > > >>A A A >> mean that you have to do all the >> job, but >> > > > >>> rather be >> > > > >>> > an efficient with >> > > > >>> > > > > >>A A A a stick ;) >> > > > >>> > > > > >>A A A >> >> > > > >>> > > > > >>A A A >> Thoughts? >> > > > >>> > > > > >>A A A >>AA Cos >> > > > >>> > > > > >> >> > > > >>> > > > > >>A A -- >> > > > >>> > > > > >>A A jay vyas >> > > > >>> > >> > > > >>> > -- >> > > > >>> > jay vyas >> > > > >>> >> > > > >> >> > > > >> >> > > > >> >> > > > >> -- >> > > > >> jay vyas >> > > > >> >> > > > > >> > > > >> > > > >> > > > -- >> > > > jay vyas >> > > >> > > >> > >> > >> > -- >> > jay vyas >> >> >
