Clustering or classification?

Vikas Pandya Tue, 24 Jan 2012 19:44:50 -0800

Thanks. creating vectors for these three columns and clustering them doesn't 
bring desired results. here is the usecase again.


1)User searches for some free text. This search goes against Solr and brings 
back results
2)When User selects one item from the search result, subsequent query to Solr 
is made with passing in clusterId of the record which was selected from the 
search result. FYI: I created clusters and then indexed the clusterId for each 
record in the Solr Index so everything is at one place. 

With this is selected record's risk levels were "High", "Medium", "High" , all 
items in the cluster for that should have same RiskLevel values (desired). I 
understand that vector/cluster will just go by word similarity hence it doesn't 
care if "High" appears for RiskLevel1 or RiskLevel3. hence clustering of these 
three columns aren't bringing back desired results.

Today I have got more requirement to cluster by 8 other columns (on top of 3 
Risk columns). Those 8 new columns are percentage values. 

Does this demand classification rather than clustering? I have just started 
reading classification section from Mahout In Action. Opinions please?


Thanks,


________________________________
 From: Frank Scholten <[email protected]>
To: [email protected] 
Sent: Friday, January 20, 2012 12:48 PM
Subject: Re: How to present mahout cluster in combination with Solr results
 
On Fri, Jan 20, 2012 at 4:01 PM, Vikas Pandya <[email protected]> wrote:
> From the example below, solr search results should be clustered in some
> following way
> list all the items which have matching RiskLevels e.g.
>
>
> Cluster 1:
> Title          RiskLevel1          RiskLevel2         RiskLevel3
> abc            High                     Medium             Low
> xyz            High                      Medium            High
> def            Low                        Medium           High
>
> Cluster 2:
> Title          RiskLevel1          RiskLevel2         RiskLevel3
> omn            Low                     Medium             Low
> yui            Low                      Medium            High
> bnm            Medium             Medium           High
>
> Though I have a feeling I don't need to use Mahout clustering for this, I am
> still trying to hook in mahout for this since we have more clustering
> requirements in the pipeline to cluster based on other features (attributes
> of objects).
>

You only have 27 unique risklevel combinations. You could just sort by
or more risklevels to get a sense of the data.

If you have more attributes then you could indeed look into clustering,

Cheers,

Frank

> Any thoughts?
>
> ________________________________
> From: Vikas Pandya <[email protected]>
> To: Frank Scholten <[email protected]>; "[email protected]"
> <[email protected]>
> Sent: Thursday, January 19, 2012 11:05 AM
>
> Subject: Re: How to present mahout cluster in combination with Solr results
>
> Hi Frank,
>
> Thanks for the link. That was useful. It's still bit unclear on how he built
> his index. are we saying, we index  clusterId,clusterSize and clusterLable
> in the same index (where other data is indexed)? So one index will have two
> sets of Solr documents in it?  one containing cluster info?
>
> My requirement again; I have bunch of db columns which are being indexed.
> e.g.
> Title,             RiskLevel1, RiskLevel2,RiskLevel3 etc
> Title1        High             Medium      Low
>
> Current requirement is to cluster documents based on their riskLevels and
> NOT the title.
>
> Thanks,
>
>
> ________________________________
> From: Frank Scholten <[email protected]>
> To: [email protected]; Vikas Pandya <[email protected]>
> Sent: Thursday, January 19, 2012 4:24 AM
> Subject: Re: How to present mahout cluster in combination with Solr results
>
> Hi Vikas,
>
> I suggest indexing the cluster label, cluster size and
> cluster-document mappings so you can use that information to build a
> tag cloud of your data. Checkout this presentation
> http://java.dzone.com/videos/configuring-mahout-clustering
>
> Cheers,
>
> Frank
>
> On Thu, Jan 19, 2012 at 4:18 AM, Vikas Pandya <[email protected]> wrote:
>> Hello,
>>
>> I have successfully created vectors from reading my existing Solr Index.
>> Then created sequenceFile and mahout clusters from it. As I understand that
>> currently solr and mahout clustering aren't integrated, what's the best way
>> to represent mahout clusters to the user? Mine is a search application which
>> renders results by querying solr index. Now I need to incorporate Mahout
>> created clusters in the result. While Solr-Mahout integration isn't there
>> yet, what's the best alternative way to represent this info?
>>
>> Thanks,
>

Clustering or classification?

Reply via email to