It sounds like most questions concern setup of a project. I agree it's not trivial to understand how to use Maven, though that's a question of learning Maven rather than Mahout. Likewise for Eclipse, or for IntelliJ, or for integrating this into an Ant project, or setting up a servlet container. There are a hundred setups you could document. All are useful -- just that it's a load of work to write and maintain!
There is generally much more work that could be done than there is active contribution to the project. So this sort of activity in the wiki or project is most welcome as it's not going to get done otherwise probably; the code itself still needs plenty of work let alone documentation, which I don't think anyone is meaningfully working on at the moment. Parts of the implementation have gone a bit stale already and so are in ways inconsistent with other parts, which contributes to the getting-started problem. The scope of the project is ambitious, and it is hard to make this stuff a simple API as the underlying mechanics are complex enough, and the use cases so varied, that a really complete packaging and treatment of it even in the code would be a project five times larger. Beyond basic usage, you need to be at HEAD in Subversion and in the source code to understand and work with it anyway. This is just going to be hard to gently introduce no matter what.. I think it gets better over time as things slowly mature and ways of doing things on Hadoop get more standard. In the short term it's only going to improve through contributions so these are all very welcome. Putting this into the project wiki would be the most ideal thing. On Sun, Feb 20, 2011 at 3:45 PM, Dan Brickley <[email protected]> wrote: > On 19 February 2011 22:52, Petr Cvengros <[email protected]> wrote: >> Yeah, I believe the book is great. Its only problem is that it isn't >> free and the first free chapter doesn't give much details on setting >> Mahout up. People who would like to experiment a bit with an open >> source library usually aren't willing to buy a $30 book. Anyway, I >> think at least the introductory documentation should be freely >> available. > > Yes. I recently bought the book, and it's worth every penny. > > I've been experimenting with Mahout since last year, but it took quite > a bit of messing around before getting to the stage where I knew it > was something that I wanted a book about. Having a bit more gentle > intro material free online would only complement the book, and help > people realise they'd benefit from it. > > My learning experience was fairly positive so far. I first followed > the Taste demo and happily swapped out the data file for one of my > own, which gave me a simple demo recommender without writing a line of > code. That simple exercise was very reassuring. > > After that, the trail scattered in various directions and I felt > relatively lost. Should I talk the ant build file from the demo .zip > at http://www.ibm.com/developerworks/java/library/j-mahout/ and > explore from there? Or build on top of Maven? Or on the commandline > since there seem to be a lot of commandline utilities. And that > familiar "I guess I should try using Eclipse again" feeling. If I care > about the Hadoop side, how does that affect my choices? if I want to > code in another language should I build REST services with the HTTP > service machinery that comes with Taste, or is that just for demos, > just for recommenders? I doubt there are 'right' answers to all of > these, but I felt as if stood at a crossroads with interesting arrows > pointing in every direction. So buying the book gives a bit of > structure to all this. While not a natural user of Eclipse, I got set > up and worked through the examples (copypaste from pdf mostly, a bit > frustrating but forced me to really read the code). This helped hugely > in getting from "I'm pretty sure I heard that Mahout does X" to having > running code on my machine that actually does it... > > cheers, > > Dan >
