Re: Mahout lucene UTFDataFormatException: encoded string too long:

2013-04-25 Thread Ted Dunning
This sounds pretty fishy. What this is saying is that you have a document in your index whose name is longer than 65,535 characters. That doesn't sound very plausible. Don't you have a more appropriate ID column? The problem starts where you say --idField text. Pick a better field. On Wed,

mahout 0.7 NaiveBayes usage

2013-04-25 Thread 蒋玉柱
I'm using mahout 0.7 NaiveBayes Algorithm. I want to use my own data with the Algorithm. Anyone can give some example code with the NaiveBayes Algorithm. I had browse the mahout 0.7 NaiveBayes source code . The NaiveBayes training code is in class

Re: Mahout lucene UTFDataFormatException: encoded string too long:

2013-04-25 Thread nishant rathore
Hi Ted, That was a stupid mistake. Thanks a lot for quick reply and pointing out the issue. I have change the idfield to link of the document. *./bin/mahout lucene.vector -d /home/pacman/DownloadedCodes/solr-4.2.0/example/example-DIH/solr/plaintext/data/index --idField link -o

kmeans local vs mapreduce difference

2013-04-25 Thread Mihai Josan
Hi, I'm running a kmeans clusterization on a small sequence (around 50 KB) file on a 2 node cluster. The block size for this file is 20 KB, so it uses 3 mappers I am using CDH4.2.0 with yarn and Mahout 0.7 If the job runs local on only one node the used CPU is around 20% and the job finishes in

Random Forrest implementation in mahout

2013-04-25 Thread qiaoresearcher
I just run the RF examples, non-distributed version: BreimanExample with glass data, 10 iterations with 100 trees, here is the unexpected output: 13/04/25 15:38:40 INFO df.BreimanExample: 13/04/25 15:38:40 INFO df.BreimanExample: Random Input Test

CfP 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-04-25 Thread MHPC 2013
we apologize if you receive multiple copies of this message === CALL FOR PAPERS 2013 Workshop on Middleware for HPC and Big Data Systems MHPC '13 as part of Euro-Par 2013, Aachen, Germany

Re: Mahout lucene UTFDataFormatException: encoded string too long:

2013-04-25 Thread nishant rathore
Hi, Afer running the commane, * * *./bin/mahout clusterdump -i ../output/fetise/fetise-fkmeans-clusters/ -o ../output/fetise/clusterdump -p ../output/fetise/fetise-fkmeans-centroids/ -d ../output/fetise/luceneDictionary -dm org.apache.mahout.common.distance.TanimotoDistanceMeasure* * * My