Re: Adapters for mahout inputs .... anyone working on this?

2014-02-22 Thread Jay Vyas
Yes it will be tricky. But that said, i think we should be able to simply change the existing parsers to be more flexible , to accomodate at least slightly more diverse inputs. For example, variable columns in CSV etc...

Re: OutOfMemoryError: Java Heap Space in DocumentProcessor.tokenizeDocuments

2014-02-22 Thread Johannes Schulte
1I would pass the memory parameters in the args array directly. The hadoop specific arguments must come before your custom arguments, so like this String[] args = new String[]{"-Dmapreduce.map.memory.mb=12323","customOpt1" ToolRunner.run(..args) The tool runner takes care of putting the hadoop sp

Re: Adapters for mahout inputs .... anyone working on this?

2014-02-22 Thread Ted Dunning
Robin, This is a great example of how the problem is a bit harder for Mahout. Mnay of the large datasets of interest will not fit in memory. That means that mixed memory and disk formats are important in this flow which also makes API's more complicated. On Sat, Feb 22, 2014 at 10:12 AM, Rob

Re: Adapters for mahout inputs .... anyone working on this?

2014-02-22 Thread Robin East
Various existing frameworks like R or python/sklearn already have existing interfaces so would be good to follow some existing patterns. I personally find sklearn's transform/fit/predict pattern very easy to use and understand. Sent from my iPhone On 22 Feb 2014, at 09:15, Gokhan Capan wrote:

Re: Adapters for mahout inputs .... anyone working on this?

2014-02-22 Thread Gokhan Capan
I'm personally positive on this. Could you give an example code snippet that shows how the usage is going to be? Sent from my iPhone > On Feb 22, 2014, at 5:37, Jay Vyas wrote: > > Hi dead. Sure I will take a look. > > >> On Fri, Feb 21, 2014 at 7:51 PM, Ted Dunning wrote: >> >> Great idea.