Hi Steve,
----- Original Message ---- > From: Steven A Rowe <sar...@syr.edu> > Subject: RE: Not storing, but highlighting from document sentences > > I think you can get what you want by doing the first stage retrieval, and > then >in the second stage, add required constraint(s) to the query for the matching >docid(s), and change the AND operators in the original query to OR. >Coordination will cause the best snippet(s) to rise to the top, no? Right, right. So if the original query is: foo AND bar, I'd run it against the main index, get top N hits, say N=10. Then I'd create another query: +(foo OR bar) +articleID:(ORed list of top N article IDs from main results) And then I'd use that to get enough "sentence docs" to have at least 1 of them for each hit from the main index. Hm, I wonder what happens when instead of simple foo AND bar you have a more complex query with more elaborate grouping and such... > Hmm, you'll want to run the second stage once for each hit from the first >stage, though, unless you can afford to collect *all* hits and pull out each >first stage's hit from the intermixed second stage results... Wouldn't the above get me all sentences I need for top N hits from the main result in a single shot, assuming I use high enough rows=NNN to minimize the possibility of not getting even 1 sentence for any one of those top N hits? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > Steve > > > -----Original Message----- > > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > Sent: Wednesday, January 12, 2011 7:29 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Not storing, but highlighting from document sentences > > > > Hi Stefan, > > > > Yes, splitting in separate sentences (and storing them) is OK because with > > a > > bunch of sentences you can't really reconstruct the original article > > unless you > > know which order to put them in. > > > > Searching against the sentence won't work for queries like foo AND bar > > because > > this should match original articles even if foo and bar are in different > > sentences. > > > > Otis > > > > > > > > ----- Original Message ---- > > > From: Stefan Matheis <matheis.ste...@googlemail.com> > > > To: solr-user@lucene.apache.org > > > Sent: Wed, January 12, 2011 7:02:46 AM > > > Subject: Re: Not storing, but highlighting from document sentences > > > > > > Otis, > > > > > > just interested in .. storing the full text is not allowed, but > > splitting up > > > in separate sentences is okay? > > > > > > while you think about using the sentences only as secondary/additional > > > source, maybe it would help to search in the sentences itself, or would > > that > > > give misleading results in your case? > > > > > > Stefan > > > > > > On Wed, Jan 12, 2011 at 12:02 PM, Otis Gospodnetic < > > > otis_gospodne...@yahoo.com> wrote: > > > > > > > Hello, > > > > > > > > I'm indexing some content (articles) whose text I cannot store in its > > > > original > > > > form for copyright reason. So I can index the content, but cannot > > store > > > > it. > > > > However, I need snippets and search term highlighting. > > > > > > > > > > > > Any way to accomplish this elegantly? Or even not so elegantly? > > > > > > > > Here is one idea: > > > > > > > > * Create 2 indices: main index for indexing (but not storing) the > > original > > > > content, the secondary index for storing individual sentences from > > the > > > > original > > > > article. > > > > > > > > * That is, before indexing an article, split it into sentences. Then > > index > > > > the > > > > article in the main index, and index+store each sentence in the > > secondary > > > > index. So for each doc in the main index there will be multiple docs > > in > > > > the > > > > secondary index with individual sentences. Each sentence doc > > includes an > > > > ID of > > > > the "parent" document. > > > > > > > > * Then run queries against the main index, and pull individual > > sentences > > > > from > > > > the secondary index for snippet+highlight purposes. > > > > > > > > > > > > The problem I see with this approach (and there may be other ones > > that I am > > > > not > > > > seeing yet) is with queries like foo AND bar. In this case "foo" may > > be a > > > > match > > > > from sentence #1, and "bar" may be a match from sentence #7. Or > > maybe > > > > "foo" is > > > > a match in sentence #1, and "bar" is a match in multiple sentences: > > #7 and > > > > #10 > > > > and #23. > > > > > > > > Regardless, when a query is run against the main index, you don't > > know > > > > where the > > > > match was, so you don't know which sentences to go get from the > > secondary > > > > index. > > > > > > > > Does anyone have any suggestions for how to handle this? > > > > > > > > Thanks, > > > > Otis > > > > ---- > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > > > > >