Hi Mike,

Interesting problem - here's some pointers on where to get started.

For finding similar segments, check out Solr's More Like This support -
it's built in to the query request processing so you just need to enable it
with query params.

There's nothing built in for doing batch queries from the client side. You
might look into implementing a custom search component and register it as a
first-component in your search handler (take a look at solrconfig.xml for
how search handlers are configured, e.g. /browse).

Cheers,
Tim


On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas <mikehaas...@gmail.com> wrote:

> Hello. My company is currently thinking of switching over to Solr 4.2,
> coming off of SQL Server. However, what we need to do is a bit weird.
>
> Right now, we have ~12 million segments and growing. Usually these are
> sentences but can be other things. These segments are what will be stored
> in Solr. I’ve already done that.
>
> Now, what happens is a user will upload say a word document to us. We then
> parse it and process it into segments. It very well could be 5000 segments
> or even more in that word document. Each one of those ~5000 segments needs
> to be searched for similar segments in solr. I’m not quite sure how I will
> do the query (whether proximate or something else). The point though, is to
> get back similar results for each segment.
>
> However, I think I’m seeing a bigger problem first. I have to search
> against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
> pretty sure that would take a LOT of hardware. Keep in mind this could be
> happening with maybe 4 different users at once right now (and of course
> more in the future). Is there a good way to send a batch query over one (or
> at least a lot fewer) http requests?
>
> If not, what kinds of things could I do to implement such a feature (if
> feasible, of course)?
>
>
> Thanks,
>
> Mike
>

Reply via email to