Re: Mahout Vs Spark

2014-10-23 Thread Ted Dunning
What you say does not imply that numpy can inter-operate with existing Spark machine learning code. It is also certainly the case that no numpy currently uses Spark. It may well be that users could use numpy in closures being sent to Spark, but that is a far walk from useful parallel numerical co

Re: Mahout Vs Spark

2014-10-23 Thread thejas prasad
Ted I am not too sure but this https://spark.apache.org/faq.html, suggests otherwise I think. Does Spark require modified versions of Scala or Python? No. Spark requires no changes to Scala or compiler plugins. The Python API uses the standard CPython implementation, and can call into existing C

Re: Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

2014-10-23 Thread Benjamin Eckstein
I try to create a vector map from lucene within java. mahout is trying to invoke lucene 3.x classes not me. i will be grateful, if someone can give me a sample code of how to create a vector file from a lucene 4.x index directory. http://mahout.apache.org/users/basics/creating-vectors-from-text.h

Re: Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

2014-10-23 Thread Suneel Marthi
You can't be using Lucene 4x with Lucene 3x. Lucene 4x is not backward compatible with Lucene 3x. R u trying to set TermVectors and offsets, if so it should be done differently with Lucene 4x, see TestClusterDumper.java for an example. On Thu, Oct 23, 2014 at 7:15 PM, Benjamin Eckstein wrote: >

Re: Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

2014-10-23 Thread Benjamin Eckstein
what information do you need? I use mahout 0.9 and lucene 4.6.1 via maven depency. those two line in the main method produces the error String[] args = {"--field title","--dir ressources/mahout/tmp", "--dictOut term_dictionary.txt","--output sequence.file","--idfield isbn"}; org.apache.mah

Re: Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

2014-10-23 Thread Benjamin Eckstein
sorry: Part 2. String[] args = {"--field title","--dir ressources/mahout/tmp", "--dictOut term_dictionary.txt","--output sequence.file","--idfield isbn"}; org.apache.mahout.utils.vectors.lucene.Driver.main(args); I have posted more details on Stackoverflow see http://stackoverflow.com/q

Re: Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

2014-10-23 Thread thejas prasad
Can you please provide more information? On Thu, Oct 23, 2014 at 3:51 PM, Benjamin Eckstein wrote: > Hello, i have 2 lines of code, that produces a class not found exception >

Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

2014-10-23 Thread Benjamin Eckstein
Hello, i have 2 lines of code, that produces a class not found exception

Re: Mahout Vs Spark

2014-10-23 Thread Ted Dunning
Hmmm I don't think that the array formats used by Spark are compatible with the formats used by numpy. I could be wrong, but even if there isn't outright incompatibility, there is likely to be some significant overhead in format conversion. On Tue, Oct 21, 2014 at 6:12 PM, Vibhanshu Prasad

Re: Upgrade to Spark 1.1.0?

2014-10-23 Thread Pat Ferrel
Off the list I’ve heard of problems using the maven artifacts for Spark even when you are not building Spark. There have been reported problems in the serialization class UIDs generated when building Mahout. If you encounter those try the build method in the PR and report these to the Spark folk

Re: using Mahout to classify customer service and sales emails?

2014-10-23 Thread Mahesh Balija
Hi Ted, What is MapR classifiers? do you mean MapReduce? Since the data is streaming data, shall we store the data in any database like NoSQL DB and export it to Hadoop (if the data is huge) build the model, and deploy the model in production for classifying the streaming data in realtime? But ho