you might want to check - http://wiki.apache.org/solr/TermVectorComponent Should provide you with the term vectors with a lot of additional info.
Regards, Jayendra On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout <gabri...@mysimpatico.com> wrote: > Hello, > > This time I'm trying to duplicate Luke's functionality of knowing which > terms occur in a search result/document (w/o parsing it again). Any Solrj > API to do that? > > P.S. I've also posted the question on > SO<http://stackoverflow.com/q/7219111/300248> > . > > On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout > <gabri...@mysimpatico.com>wrote: > >> From you patch I see TermFreqVector which provides the information I >> want. >> >> I also found FieldInvertState.getLength() which seems to be exactly what I >> want. I'm after the word count (sum of tf for every term in the doc). I'm >> just not sure whether FieldInvertState.getLength() returns just the number >> of terms (not multiplied by the frequency of each term - word count) or not >> though. It seems as if it returns word count, but I've not tested it >> sufficienctly. >> >> >> On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger >> <the.apache.t...@gmail.com>wrote: >> >>> Gabriele, >>> >>> I created a patch that does this about a year ago. See >>> https://issues.apache.org/jira/browse/SOLR-1837. It was written for Solr >>> 1.4 and is based upon the Document Reconstructor in Luke. The patch adds >>> a >>> link to the main solr admin page to a docinspector page which will >>> reconstruct the document given a uniqueid (required). Keep in mind that >>> you're only looking at what's "in" the index for non-stored fields, not >>> the >>> original text. >>> >>> If you have any issues using this on the most recent release, let me know >>> and I'd be happy to create a new patch for solr 3.3. One of these days >>> I'll >>> remove the JSP dependency and this may eventually making it into trunk. >>> >>> Thanks, >>> >>> -Trey Grainger >>> Search Technology Development Team Lead, Careerbuilder.com >>> Site Architect, Celiaccess.com >>> >>> >>> On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout >>> <gabri...@mysimpatico.com>wrote: >>> >>> > Hello, >>> > >>> > With an inverted index the term is the key, and the documents are the >>> > values. Is it still however possible that given a document id I get the >>> > terms indexed for that document? >>> > >>> > -- >>> > Regards, >>> > K. Gabriele >>> > >>> > --- unchanged since 20/9/10 --- >>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the >>> > receipt within 48 hours then I don't resend the email. >>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ >>> > time(x) >>> > < Now + 48h) ⇒ ¬resend(I, this). >>> > >>> > If an email is sent by a sender that is not a trusted contact or the >>> email >>> > does not contain a valid code then the email is not received. A valid >>> code >>> > starts with a hyphen and ends with "X". >>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ >>> > L(-[a-z]+[0-9]X)). >>> > >>> >> >> >> >> -- >> Regards, >> K. Gabriele >> >> --- unchanged since 20/9/10 --- >> P.S. If the subject contains "[LON]" or the addressee acknowledges the >> receipt within 48 hours then I don't resend the email. >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ >> time(x) < Now + 48h) ⇒ ¬resend(I, this). >> >> If an email is sent by a sender that is not a trusted contact or the email >> does not contain a valid code then the email is not received. A valid code >> starts with a hyphen and ends with "X". >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ >> L(-[a-z]+[0-9]X)). >> >> > > > -- > Regards, > K. Gabriele > > --- unchanged since 20/9/10 --- > P.S. If the subject contains "[LON]" or the addressee acknowledges the > receipt within 48 hours then I don't resend the email. > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) > < Now + 48h) ⇒ ¬resend(I, this). > > If an email is sent by a sender that is not a trusted contact or the email > does not contain a valid code then the email is not received. A valid code > starts with a hyphen and ends with "X". > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > L(-[a-z]+[0-9]X)). >