Mahout - Solr vs Mahout Lucene Question

2013-01-24 Thread vybe3142
Hi,
I hate to double post but I'm not sure in which domain, the answer to my
question lies, so here's the link to my question on the mahout groups.

Basically, I'm getting different clustering results depending on whether I
index data with SOLR or Lucene. Please post any responses against the
original question.

Thanks

http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-td4036013.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mahout-Solr-vs-Mahout-Lucene-Question-tp4036014.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mahout Solr

2011-06-16 Thread Adam Estrada
You're right...It would be nice to be able to see the cluster results coming
from Solr though...

Adam

On Thu, Jun 16, 2011 at 3:21 AM, Andrew Clegg andrew.clegg+mah...@gmail.com
 wrote:

 Well, it does have the ability to pull TermVectors from an index:


 https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html#CreatingVectorsfromText-FromLucene

 Nothing Solr-specific about it though.

 On 15 June 2011 15:38, Mark static.void@gmail.com wrote:
  Apache Mahout is a new Apache TLP project to create scalable, machine
  learning algorithms under the Apache license. It is related to other
 Apache
  Lucene projects and integrates well with Solr.
 
  How does Mahout integrate well with Solr? Can someone explain a brief
  overview on whats available. I'm guessing one of the features would be
 the
  replacing of the Carrot2 clustering algorithm with something a little
 more
  sophisticated?
 
  Thanks
 



 --

 http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg



Mahout Solr

2011-06-15 Thread Mark
Apache Mahout is a new Apache TLP project to create scalable, machine 
learning algorithms under the Apache license. It is related to other 
Apache Lucene projects and integrates well with Solr.


How does Mahout integrate well with Solr? Can someone explain a brief 
overview on whats available. I'm guessing one of the features would be 
the replacing of the Carrot2 clustering algorithm with something a 
little more sophisticated?


Thanks


Re: Mahout Solr

2011-06-15 Thread Adam Estrada
The only integration at this point (as far as I can tell) is that Mahout can
read the lucene index created by Solr. I agree that it would be nice to swap
out the Carrot2 clustering engine with Mahout's set of algorithms but that
has not been done yet. Grant has pointed out that you can use Solr's
callback system to fire off another task like a mahout job.

http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/

Adam

On Wed, Jun 15, 2011 at 10:38 AM, Mark static.void@gmail.com wrote:

 Apache Mahout is a new Apache TLP project to create scalable, machine
 learning algorithms under the Apache license. It is related to other Apache
 Lucene projects and integrates well with Solr.

 How does Mahout integrate well with Solr? Can someone explain a brief
 overview on whats available. I'm guessing one of the features would be the
 replacing of the Carrot2 clustering algorithm with something a little more
 sophisticated?

 Thanks



Re: Mahout Solr

2011-06-15 Thread Mark

I was hoping this wasn't the case :(

Is it possible to use the clustering component to use predefined 
clusters generated by Mahout?


On 6/15/11 9:14 AM, Sean Owen wrote:

Hmm, I suppose I have the same question from the Mahout side (I didn't
write that text). I would certainly call this far more related to
Hadoop than Lucene, though there are some Lucene touch-points, but no
direct connection to Solr that I'm aware of.

If I'm not wildly mistaken then I can edit the wiki.

On Wed, Jun 15, 2011 at 3:38 PM, Markstatic.void@gmail.com  wrote:

Apache Mahout is a new Apache TLP project to create scalable, machine
learning algorithms under the Apache license. It is related to other Apache
Lucene projects and integrates well with Solr.

How does Mahout integrate well with Solr? Can someone explain a brief
overview on whats available. I'm guessing one of the features would be the
replacing of the Carrot2 clustering algorithm with something a little more
sophisticated?

Thanks



Re: Mahout Solr

2011-06-15 Thread Stanislaw Osinski

 Is it possible to use the clustering component to use predefined clusters
 generated by Mahout?


Actually, the existing Solr ClusteringComponent's API has been designed to
deal with both search results clustering (implemented by Carrot2) and
off-line clustering of the whole index. The latter has not yet been
implemented, so the API is very likely to change depending on the specific
design decisions (should clustering be triggered through Solr or
externally?, should the clusters be stored in Solr?, how to handle new
documents?, how to use the clusters at search time?).

I can also imagine a simpler approach based on a search results clustering
algorithm that would simply fetch Mahout's predefined clusters for each
document being returned in the search result. Getting this to work is a
matter of implementing a dedicated
http://lucene.apache.org/solr/api/org/apache/solr/handler/clustering/SearchClusteringEngine.html
and
should be fairly straightforward, at least in terms of interaction with
Solr.

Staszek


Re: Mahout Solr

2011-06-15 Thread Sean Owen
Hmm, I suppose I have the same question from the Mahout side (I didn't
write that text). I would certainly call this far more related to
Hadoop than Lucene, though there are some Lucene touch-points, but no
direct connection to Solr that I'm aware of.

If I'm not wildly mistaken then I can edit the wiki.

On Wed, Jun 15, 2011 at 3:38 PM, Mark static.void@gmail.com wrote:
 Apache Mahout is a new Apache TLP project to create scalable, machine
 learning algorithms under the Apache license. It is related to other Apache
 Lucene projects and integrates well with Solr.

 How does Mahout integrate well with Solr? Can someone explain a brief
 overview on whats available. I'm guessing one of the features would be the
 replacing of the Carrot2 clustering algorithm with something a little more
 sophisticated?

 Thanks