Re: consistency of StaticWordValueEncoder

2015-01-07 Thread Ted Dunning
On Wed, Jan 7, 2015 at 2:20 PM, chirag lakhani wrote: > In the Mahout in Action book I got the impression that the term "memo" will > seed the random number generator and I wanted to confirm that means I will > have consistency if I deploy this vectorizer in both my Hadoop environment > as well a

consistency of StaticWordValueEncoder

2015-01-07 Thread chirag lakhani
I am trying vectorize text data for a Naive Bayes classifier that will be trained in Hadoop and then the corresponding model will be deployed in a Java app. My basic approach is to tokenize a string of text data using Lucene and then encode each token using a StaticWordValueEncoder here are the re

Re: structure of part-r-00000 and SequenceFile.Reader NullPointerException

2015-01-07 Thread Brian Dolan
I'm experiencing this as well. Did you find a solution? Bob Morris gmail.com> writes:

Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-07 Thread mw
Hello, the first error was due to a missing property in yarn.xml. However no i have a different problem. i am working on a web application that should execute lda on a external yarn cluster. I am uploading all the relevant sequence files onto the yarn cluter. This is how it try to remotely

Re: running spark-itemsimilarity against HDP sandbox with Spark

2015-01-07 Thread AlShater, Hani
​Great that it helps, If I ware you, I would install hdp 2.2 at ec2 machines and save some cost, then connect one of the machines to my workstation/laptop using vagrant. When you do so, yarn-client will be better for you. Good luck. ​ On Wed, Jan 7, 2015 at 5:00 PM, Pasmanik, Paul wrote: > Than

RE: running spark-itemsimilarity against HDP sandbox with Spark

2015-01-07 Thread Pasmanik, Paul
Thanks. I've been using HDP 2.1.5 with Spark 1.1.0 which I believe is Hadoop 2.4.0. Yarn-client works but yarn-cluster does not. I've done everything that you described except trying spark 1.1.1. I'll do that next. My prod deployment will actually be Amazon EMR with Spark and mahout (I've been

Using Mahout 1.0-SNAPSHOT with yarn cluster

2015-01-07 Thread mw
Hello, i am working on a web application that should execute lda on a external yarn cluster. I am uploading all the relevant sequence files onto the yarn cluter. This is how it try to remotely execute lda on the cluster. try { ugi.doAs(new PrivilegedExceptionAction() {

Re: running spark-itemsimilarity against HDP sandbox with Spark

2015-01-07 Thread AlShater, Hani
I have tried spark-itemsimilarity on HDP 2.1. Initially I got the error you got but then resolve it, here are the steps that worded with me: - check that $HADOOP_CONF_DIR is pointing to the right hadoop config dir. - get spark 1.1.1 binaries precompiled for hadoop 2.4. If you are using HDP 2.2 I th