Hi Syed,

I never used Lucene or Solr myself, so I could just rephrase what's in the 
mahout wiki. So just take a look on how two convert a Lucene (== Solr) index to 
a Mahout compatible vector format [1]. Also have a look at the JavaDocs [2], 
especially KMeansDriver and CosineDistanceMetric. In order to use your solr 
index for clustering (I assume KMeans Clustering, other clustering algorithms 
should work the same way) you create a Sequence File from your index as 
described in the wiki [1]. After that create a new job using KMeansDriver by 
calling the constructor with the input directory, output directory, .... and 
most important the CosineDistanceMetric as parameter. That's it, at least the 
hard part, for the actual clustering you just call run on your job and sit back 
and relax.

To make it clear, I assumed you are using KMeans Clustering and the Body text 
of your solr index. The described process should be applicable for the other 
clustering algorithms as well.

Basically you just need your input data (the data that should be used for 
clustering) as iterable collection of vectors. And a distance metric that could 
be used for your input data. In your case this is the body text of the index 
and cosine distance. And again for your stated use case you don't really want 
just three dimensions, since every word in your body text represent a new 
dimensions, so you have most likely much more than just three dimensions.

If I totally missed your point please speak out.


So long,
Christoph

[1] 
https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text#CreatingVectorsfromText-FromLucene
[2] https://builds.apache.org/job/Mahout-Quality/javadoc/


Am 05.12.2011 um 15:15 schrieb syed kather:

> Thanks  Christoph
> Can you give sample info on clustering on 3D which i can understood .
> 
> Please help me .. so that i can learn new things. is it poosible using
> Solr?. If so How can i do that .
> 
> 
>            Thanks and Regards,
>        S SYED ABDUL KATHER
>                9731841519
> 
> 
> On Mon, Dec 5, 2011 at 7:12 PM, Christoph Brücke <
> [email protected]> wrote:
> 
>> Hi Syed,
>> 
>> to answer your first question, YES mahout is totally capable of clustering
>> in three dimension. However, as far as my knowledge goes with
>> KMeansClustering, each feature (dimension) has to be the same type. Meaning
>> there has to be one distance metric which is capable of expressing the
>> distance between every to points. That said i don't think that you can
>> define a metric which uses seqid, text and text(filepath) as coordinates.
>> But I think you could just use the body of your index and calculate
>> something like cosine distance to cluster your index entries, as seqid is
>> propably unique to every entry and the file path is not really relevant (at
>> least I can't come up with any suitable use case).
>> 
>> TL;DR: Yes you can cluster in multiple dimensions as long as you can
>> define a distance between every pair. You probably better off using just
>> the body text of your solr index.
>> 
>> Regards,
>> Christoph
>> 
>> 
>> Am 05.12.2011 um 14:09 schrieb syed kather:
>> 
>>> Team,
>>> 
>>>    Is it possible to clustering in 3D?
>>> 
>>> 
>>> I am trying the case like give below.
>>> 
>>> 1.  I am have having solr index with three Fields (SEQID,BODY(content of
>>> Text file),FILEPATH);
>>> 
>>> Now i need to cluster this Please Help me how to do this is there a way?
>>> 
>>>           Thanks and Regards,
>>>       S SYED ABDUL KATHER
>> 
>> 
>> 

Christoph Brücke
[email protected]



Reply via email to