Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-10 Thread Grant Ingersoll
Right, a SearchComponent wrapper around some of the Solr Cell capabilities could make this so. On Aug 9, 2009, at 11:21 AM, Jay Hill wrote: Solr Cell definitely sounds like it has a place here. But wouldn't it be needed for as an extracting component earlier in the process for the MoreLikeT

Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-09 Thread Jay Hill
Solr Cell definitely sounds like it has a place here. But wouldn't it be needed for as an extracting component earlier in the process for the MoreLikeThisHandler? The MLT Handler works great when it's directed to a content stream of plain text. If we could just use Solr Cell to identify the file ty

Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-09 Thread Grant Ingersoll
It's starting to sound like Solr Cell needs a SearchComponent as well, that can come before the QueryComponent and can be used to map into the other components. Essentially, take the functionality of the extractOnly option and have it feed other SearchComponent. On Aug 8, 2009, at 10:42 A

Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-08 Thread Ken Krugler
On Aug 7, 2009, at 5:23pm, Jay Hill wrote: I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi ?f=/c/a/2009/08/06/SP5R194Q13.

MoreLikeThis: How to get quality terms from html from content stream?

2009-08-07 Thread Jay Hill
I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTL&mlt.fl=body&rows=4&debugQuery=true But, not su