Batch Search Query

2013-03-28 Thread Mike Haas
Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr.

Re: Batch Search Query

2013-03-28 Thread Timothy Potter
Hi Mike, Interesting problem - here's some pointers on where to get started. For finding similar segments, check out Solr's More Like This support - it's built in to the query request processing so you just need to enable it with query params. There's nothing built in for doing batch queries

Re: Batch Search Query

2013-03-28 Thread Roman Chyla
Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping segments, this algorithm will dramatically reduce the number of segments you

Re: Batch Search Query

2013-03-28 Thread Mike Haas
Thanks for your reply, Roman. Unfortunately, the business has been running this way forever so I don't think it would be feasible to switch to a whole document store versus segments store. Even then, if I understand you correctly it would not work for our needs. I'm thinking because we don't care

Re: Batch Search Query

2013-03-28 Thread Walter Underwood
This might not be a good match for Solr, or for many other systems. It does seem like a natural fit for MarkLogic. That natively searches and selects over XML documents. Disclaimer: I worked at MarkLogic for a couple of years. wunder On Mar 28, 2013, at 9:27 AM, Mike Haas wrote: Thanks for

Re: Batch Search Query

2013-03-28 Thread Mike Haas
Thanks Timothy, In regards to you mentioning using MoreLikeThis, do you know what kind of algorithm it uses? My searching didn't reveal anything. On Thu, Mar 28, 2013 at 10:51 AM, Timothy Potter thelabd...@gmail.comwrote: Hi Mike, Interesting problem - here's some pointers on where to get

Re: Batch Search Query

2013-03-28 Thread Roman Chyla
On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas mikehaas...@gmail.com wrote: Thanks for your reply, Roman. Unfortunately, the business has been running this way forever so I don't think it would be feasible to switch to a whole sure, no arguing against that :) document store versus segments

Re: Batch Search Query

2013-03-28 Thread Mike Haas
I will definitely let you all know what we end up doing. I realized I forgot to mention something that might make what we do more clear. Right now we use sql server full text to get back fairly similar matches for each segment. We do this with some funky sql stuff which I didn't write and haven't

Re: Batch Search Query

2013-03-28 Thread Chris Hostetter
: Now, what happens is a user will upload say a word document to us. We then : parse it and process it into segments. It very well could be 5000 segments : or even more in that word document. Each one of those ~5000 segments needs : to be searched for similar segments in solr. I’m not quite sure