Re: Interpretation of cluster output

Andrew Musselman Wed, 18 Jun 2014 09:58:31 -0700

Interesting; could be a bug, I'll take a look.


On Tue, Jun 17, 2014 at 10:38 AM, Han Fan <[email protected]> wrote:

> Is this command line what you need? (Replace /user/root/testdataout with
> your output directory)
> $ mahout seqdumper -i /user/root/testdataout/data/part-m-00000
> Key: 9: Value: {0:1.0,2:-0.956,1:-0.213,5:0.091,3:-0.003,7:-0.024,6:0.017,
> 8:1.0,4:0.056}
> Key: 9: Value: {0:1.0,2:2.129,1:3.147,5:-0.063,3:-0.006,7:0.109,6:-0.002,
> 4:-0.056}
> Key: 9: Value: {0:1.0,2:-2.718,1:-2.165,5:-0.103,3:-0.008,7:-0.024,6:-0.
> 156,8:1.0,4:0.043}
> ...
>
> Sorry if I misunderstand.
>
>
>
>
> On 16/6/14 3:44 pm, Kamesh wrote:
>
>> Thanks for the response Andrew. I am using Mahout 0.9 version. However, I
>> tried with trunk version but still I am getting output in the following
>> format
>>
>> C-55{n=1 c=[15993058.000] r=[]}
>> C-56{n=2 c=[15993061.167] r=[]}
>> C-57{n=1 c=[15993062.000] r=[]}
>>
>> C-97{n=1 c=[15993103.000] r=[]}
>> C-98{n=2 c=[15993119.333] r=[0.395]}
>> C-99{n=1 c=[15993105.000] r=[]}
>>
>> and hence, not able to figure out the data points inside each cluster.
>>
>> Also, When I am running with "-of JSON" getting NPE
>>
>> Exception in thread "main" java.lang.NullPointerException
>> at
>> org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(
>> JsonClusterWriter.java:118)
>> at
>> org.apache.mahout.utils.clustering.JsonClusterWriter.
>> write(JsonClusterWriter.java:73)
>> at
>> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(
>> AbstractClusterWriter.java:115)
>> at
>> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(
>> AbstractClusterWriter.java:102)
>>
>> I am executing cluster dump using the following way
>>
>> hadoop jar mahout-integration-1.0-SNAPSHOT.jar
>> org.apache.mahout.utils.clustering.ClusterDumper -i
>> /canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000
>>
>> Also I have observed that the *part* file created inside *clusteredPoints*
>> is empty.
>>
>> Please help me how to get data points from each cluster.
>>
>>
>> On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman <
>> [email protected]> wrote:
>>
>>  That's going to be easier if you can work off of trunk, since the output
>>> of
>>> clustering has been cleaned up to write a better format, per
>>> https://issues.apache.org/jira/browse/MAHOUT-1505
>>>
>>> E.g.,
>>>
>>> {
>>>    "top_terms": [
>>>      {"all":3.0149030685424805},
>>>      {"english":3.0149030685424805},
>>>      {"best":3.0149030685424805},
>>>      {"spaniel":3.0149030685424805},
>>>      {"springer":3.0149030685424805},
>>>      {"dogs":1.9162907600402832}
>>>    ],
>>>    "cluster_id": 7,
>>>    "cluster": {
>>>      "r": [],
>>>      "c": [
>>>        {"all":3.015},
>>>        {"best":3.015},
>>>        {"dogs":1.916},
>>>        {"english":3.015},
>>>        {"spaniel":3.015},
>>>        {"springer":3.015}
>>>      ],
>>>      "n": 1,
>>>      "identifier": "C-7"
>>>    },
>>>    "points": [
>>>      {
>>>        "point": [
>>>          {"all":3.015},
>>>          {"best":3.015},
>>>          {"dogs":1.916},
>>>          {"english":3.015},
>>>          {"spaniel":3.015},
>>>          {"springer":3.015}
>>>        ],
>>>        "vector_name": "P(14)",
>>>        "weight": "1.0"
>>>      }
>>>    ]
>>> }
>>>
>>>
>>> On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <[email protected]> wrote:
>>>
>>>  Hi All,
>>>> Please help me in getting the data points inside each cluster.
>>>> The output of the clustering algorithm is center of the cluster and
>>>>
>>> radius
>>>
>>>> of the cluster. How do we derive actual data points inside each cluster
>>>> from this output.
>>>>
>>>> --
>>>> Kamesh.
>>>>
>>>>
>>>
>>
>>
>>
>
>

Re: Interpretation of cluster output

Reply via email to