Waleed: a fix for this was checked in on January 27. Are you using the
trunk, or the 0.4 release? Most people use the trunk, and they
generally recommend it. If you're on the trunk, it is time to do an
update to the latest code.

Lance

On Thu, Feb 17, 2011 at 3:16 PM, Shige Takeda <[email protected]> wrote:
> hi, I believe the following bug already addressed the issue:
> https://issues.apache.org/jira/browse/MAHOUT-594
>
> Thanks, -- Shige
>
> On Thu, Feb 17, 2011 at 3:57 AM, WaleedAzmy <[email protected]> wrote:
>
>>
>> Dear All...
>>
>> I tried to test Mahout K-Mean clustering on Arabic data. But -I think-
>> there
>> is a problems in encoding...
>>
>> I tried the following commands:
>> =======================
>>
>> $ ./mahout seqdirectory -i "....\Arabic_data" -o
>> "....\ArabicTest\Arabic_data-seqdir" -c UTF-8 -chunk 5
>>
>> $ ./mahout seq2sparse -i "....\ArabicTest\Arabic_data-seqdir" -o
>> "....\ArabicTest\Arabic_data_out-seqdir"
>>
>> $ ./mahout kmeans -i
>> "....\ArabicTest\Arabic_data_out-seqdir\tfidf-vectors/"
>> -c "....\ArabicTest\clusters" -o "....\ArabicTest\arabic-kmeans" -x 10 -k
>> 20
>> -ow
>>
>> $ ./mahout clusterdump -s "....\ArabicTest\arabic-kmeans\clusters-1" -d
>> "....\ArabicTest\Arabic_data_out-seqdir\dictionary.file-0" -dt sequencefile
>> -b 100 -n 20
>>
>>
>> The clusterdump generate the following output
>> ===================================
>>
>> o HADOOP_HOME set, running locally
>> :VL-1{n=1 c=[24:6.187, 31:5.912, 53:7.643, 69:7.958, 77:8.365, ??:2.260,
>> ?????:5.627, ?????:5.627, ??
>>        Top Terms:
>>                ????                                    =>
>>  11.830205917358398
>>                ?????                                   =>
>>  10.808554649353027
>>                ???????                                 =>
>>  8.93863296508789
>>                ?????                                   =>
>>  8.93863296508789
>>                ???????                                 =>
>>  8.93863296508789
>>                ???????                                 =>
>>  8.93863296508789
>>                77                                      =>
>> 8.365219116210938
>>                ????                                    =>
>> 8.365219116210938
>>                ??????                                  =>
>> 8.365219116210938
>>                ???????????                             =>
>> 8.365219116210938
>>                69                                      =>
>> 7.958374977111816
>>                ?????                                   =>
>>  7.6428022384643555
>>                53                                      =>
>>  7.6428022384643555
>>                ???                                     =>
>>  7.6428022384643555
>>                ???                                     =>
>> 7.384960651397705
>>                ?????                                   =>
>> 7.384960651397705
>>                ?????                                   =>
>> 7.166958332061768
>>                24                                      =>
>> 6.186699867248535
>>                31                                      =>
>>  5.9121222496032715
>>                ?????                                   =>
>> 5.627420902252197
>> :VL-104{n=1 c=[??:6.089, ????:5.404, ??????:3.795, ???????:5.915,
>> ??????:7.385, ????????:8.939, ?????
>>        Top Terms:
>>                ????????                                =>
>>  12.641136169433594
>>                ??????                                  =>
>> 9.422260284423828
>>                ?????????                               =>
>>  8.93863296508789
>>                ????                                    =>
>>  8.93863296508789
>>
>>
>> ===============================================================
>> I think the meaningless (?) is a problem of encoding.... Can anyone help me
>> in this????
>>
>> Also I want a tutorial describing the command for k-mean clustering and it
>> attributes and what is the output of clusterdump represent for?
>>
>> Thank you....
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Arabic-K-mean-clustering-tp2518248p2518248.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>



-- 
Lance Norskog
[email protected]

Reply via email to