Mahout - Solr vs Mahout Lucene Question
Hi, I hate to double post but I'm not sure in which domain, the answer to my question lies, so here's the link to my question on the mahout groups. Basically, I'm getting different clustering results depending on whether I index data with SOLR or Lucene. Please post any responses against the original question. Thanks http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-td4036013.html -- View this message in context: http://lucene.472066.n3.nabble.com/Mahout-Solr-vs-Mahout-Lucene-Question-tp4036014.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mahout Solr
You're right...It would be nice to be able to see the cluster results coming from Solr though... Adam On Thu, Jun 16, 2011 at 3:21 AM, Andrew Clegg andrew.clegg+mah...@gmail.com wrote: Well, it does have the ability to pull TermVectors from an index: https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html#CreatingVectorsfromText-FromLucene Nothing Solr-specific about it though. On 15 June 2011 15:38, Mark static.void@gmail.com wrote: Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license. It is related to other Apache Lucene projects and integrates well with Solr. How does Mahout integrate well with Solr? Can someone explain a brief overview on whats available. I'm guessing one of the features would be the replacing of the Carrot2 clustering algorithm with something a little more sophisticated? Thanks -- http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
Mahout Solr
Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license. It is related to other Apache Lucene projects and integrates well with Solr. How does Mahout integrate well with Solr? Can someone explain a brief overview on whats available. I'm guessing one of the features would be the replacing of the Carrot2 clustering algorithm with something a little more sophisticated? Thanks
Re: Mahout Solr
The only integration at this point (as far as I can tell) is that Mahout can read the lucene index created by Solr. I agree that it would be nice to swap out the Carrot2 clustering engine with Mahout's set of algorithms but that has not been done yet. Grant has pointed out that you can use Solr's callback system to fire off another task like a mahout job. http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ Adam On Wed, Jun 15, 2011 at 10:38 AM, Mark static.void@gmail.com wrote: Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license. It is related to other Apache Lucene projects and integrates well with Solr. How does Mahout integrate well with Solr? Can someone explain a brief overview on whats available. I'm guessing one of the features would be the replacing of the Carrot2 clustering algorithm with something a little more sophisticated? Thanks
Re: Mahout Solr
I was hoping this wasn't the case :( Is it possible to use the clustering component to use predefined clusters generated by Mahout? On 6/15/11 9:14 AM, Sean Owen wrote: Hmm, I suppose I have the same question from the Mahout side (I didn't write that text). I would certainly call this far more related to Hadoop than Lucene, though there are some Lucene touch-points, but no direct connection to Solr that I'm aware of. If I'm not wildly mistaken then I can edit the wiki. On Wed, Jun 15, 2011 at 3:38 PM, Markstatic.void@gmail.com wrote: Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license. It is related to other Apache Lucene projects and integrates well with Solr. How does Mahout integrate well with Solr? Can someone explain a brief overview on whats available. I'm guessing one of the features would be the replacing of the Carrot2 clustering algorithm with something a little more sophisticated? Thanks
Re: Mahout Solr
Is it possible to use the clustering component to use predefined clusters generated by Mahout? Actually, the existing Solr ClusteringComponent's API has been designed to deal with both search results clustering (implemented by Carrot2) and off-line clustering of the whole index. The latter has not yet been implemented, so the API is very likely to change depending on the specific design decisions (should clustering be triggered through Solr or externally?, should the clusters be stored in Solr?, how to handle new documents?, how to use the clusters at search time?). I can also imagine a simpler approach based on a search results clustering algorithm that would simply fetch Mahout's predefined clusters for each document being returned in the search result. Getting this to work is a matter of implementing a dedicated http://lucene.apache.org/solr/api/org/apache/solr/handler/clustering/SearchClusteringEngine.html and should be fairly straightforward, at least in terms of interaction with Solr. Staszek
Re: Mahout Solr
Hmm, I suppose I have the same question from the Mahout side (I didn't write that text). I would certainly call this far more related to Hadoop than Lucene, though there are some Lucene touch-points, but no direct connection to Solr that I'm aware of. If I'm not wildly mistaken then I can edit the wiki. On Wed, Jun 15, 2011 at 3:38 PM, Mark static.void@gmail.com wrote: Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license. It is related to other Apache Lucene projects and integrates well with Solr. How does Mahout integrate well with Solr? Can someone explain a brief overview on whats available. I'm guessing one of the features would be the replacing of the Carrot2 clustering algorithm with something a little more sophisticated? Thanks