This sounds pretty fishy.
What this is saying is that you have a document in your index whose name is
longer than 65,535 characters.
That doesn't sound very plausible. Don't you have a more appropriate ID
column?
The problem starts where you say --idField text. Pick a better field.
On Wed,
I'm using mahout 0.7 NaiveBayes Algorithm.
I want to use my own data with the Algorithm.
Anyone can give some example code with the NaiveBayes Algorithm.
I had browse the mahout 0.7 NaiveBayes source code .
The NaiveBayes training code is in class
Hi Ted,
That was a stupid mistake. Thanks a lot for quick reply and pointing out
the issue.
I have change the idfield to link of the document.
*./bin/mahout lucene.vector -d
/home/pacman/DownloadedCodes/solr-4.2.0/example/example-DIH/solr/plaintext/data/index
--idField link -o
Hi,
I'm running a kmeans clusterization on a small sequence (around 50 KB) file on
a 2 node cluster. The block size for this file is 20 KB, so it uses 3 mappers
I am using CDH4.2.0 with yarn and Mahout 0.7
If the job runs local on only one node the used CPU is around 20% and the job
finishes in
I just run the RF examples, non-distributed version: BreimanExample
with glass data, 10 iterations with 100 trees, here is the unexpected
output:
13/04/25 15:38:40 INFO df.BreimanExample:
13/04/25 15:38:40 INFO df.BreimanExample: Random Input Test
we apologize if you receive multiple copies of this message
===
CALL FOR PAPERS
2013 Workshop on
Middleware for HPC and Big Data Systems
MHPC '13
as part of Euro-Par 2013, Aachen, Germany
Hi,
Afer running the commane,
*
*
*./bin/mahout clusterdump -i ../output/fetise/fetise-fkmeans-clusters/ -o
../output/fetise/clusterdump -p ../output/fetise/fetise-fkmeans-centroids/
-d ../output/fetise/luceneDictionary -dm
org.apache.mahout.common.distance.TanimotoDistanceMeasure*
*
*
My