This brings up a question I have:
How often is trunk pushed up to the apache maven snapshot repo?
<repository>
<snapshots>
<enabled>true</enabled>
</snapshots>
<name>Apache Snapshots</name>
<id>apache-snapshots</id>
<url>http://repository.apache.org/snapshots</url>
</repository>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout</artifactId>
<version>0.5-SNAPSHOT</version>
</dependency>
Thanks!
Matthew
On Thu, Feb 17, 2011 at 9:44 PM, Lance Norskog <[email protected]> wrote:
> Waleed: a fix for this was checked in on January 27. Are you using the
> trunk, or the 0.4 release? Most people use the trunk, and they
> generally recommend it. If you're on the trunk, it is time to do an
> update to the latest code.
>
> Lance
>
> On Thu, Feb 17, 2011 at 3:16 PM, Shige Takeda <[email protected]> wrote:
>> hi, I believe the following bug already addressed the issue:
>> https://issues.apache.org/jira/browse/MAHOUT-594
>>
>> Thanks, -- Shige
>>
>> On Thu, Feb 17, 2011 at 3:57 AM, WaleedAzmy <[email protected]> wrote:
>>
>>>
>>> Dear All...
>>>
>>> I tried to test Mahout K-Mean clustering on Arabic data. But -I think-
>>> there
>>> is a problems in encoding...
>>>
>>> I tried the following commands:
>>> =======================
>>>
>>> $ ./mahout seqdirectory -i "....\Arabic_data" -o
>>> "....\ArabicTest\Arabic_data-seqdir" -c UTF-8 -chunk 5
>>>
>>> $ ./mahout seq2sparse -i "....\ArabicTest\Arabic_data-seqdir" -o
>>> "....\ArabicTest\Arabic_data_out-seqdir"
>>>
>>> $ ./mahout kmeans -i
>>> "....\ArabicTest\Arabic_data_out-seqdir\tfidf-vectors/"
>>> -c "....\ArabicTest\clusters" -o "....\ArabicTest\arabic-kmeans" -x 10 -k
>>> 20
>>> -ow
>>>
>>> $ ./mahout clusterdump -s "....\ArabicTest\arabic-kmeans\clusters-1" -d
>>> "....\ArabicTest\Arabic_data_out-seqdir\dictionary.file-0" -dt sequencefile
>>> -b 100 -n 20
>>>
>>>
>>> The clusterdump generate the following output
>>> ===================================
>>>
>>> o HADOOP_HOME set, running locally
>>> :VL-1{n=1 c=[24:6.187, 31:5.912, 53:7.643, 69:7.958, 77:8.365, ??:2.260,
>>> ?????:5.627, ?????:5.627, ??
>>> Top Terms:
>>> ???? =>
>>> 11.830205917358398
>>> ????? =>
>>> 10.808554649353027
>>> ??????? =>
>>> 8.93863296508789
>>> ????? =>
>>> 8.93863296508789
>>> ??????? =>
>>> 8.93863296508789
>>> ??????? =>
>>> 8.93863296508789
>>> 77 =>
>>> 8.365219116210938
>>> ???? =>
>>> 8.365219116210938
>>> ?????? =>
>>> 8.365219116210938
>>> ??????????? =>
>>> 8.365219116210938
>>> 69 =>
>>> 7.958374977111816
>>> ????? =>
>>> 7.6428022384643555
>>> 53 =>
>>> 7.6428022384643555
>>> ??? =>
>>> 7.6428022384643555
>>> ??? =>
>>> 7.384960651397705
>>> ????? =>
>>> 7.384960651397705
>>> ????? =>
>>> 7.166958332061768
>>> 24 =>
>>> 6.186699867248535
>>> 31 =>
>>> 5.9121222496032715
>>> ????? =>
>>> 5.627420902252197
>>> :VL-104{n=1 c=[??:6.089, ????:5.404, ??????:3.795, ???????:5.915,
>>> ??????:7.385, ????????:8.939, ?????
>>> Top Terms:
>>> ???????? =>
>>> 12.641136169433594
>>> ?????? =>
>>> 9.422260284423828
>>> ????????? =>
>>> 8.93863296508789
>>> ???? =>
>>> 8.93863296508789
>>>
>>>
>>> ===============================================================
>>> I think the meaningless (?) is a problem of encoding.... Can anyone help me
>>> in this????
>>>
>>> Also I want a tutorial describing the command for k-mean clustering and it
>>> attributes and what is the output of clusterdump represent for?
>>>
>>> Thank you....
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Arabic-K-mean-clustering-tp2518248p2518248.html
>>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>>
>>
>
>
>
> --
> Lance Norskog
> [email protected]
>