you might want to check - http://wiki.apache.org/solr/TermVectorComponent
Should provide you with the term vectors with a lot of additional info.

Regards,
Jayendra

On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout
<gabri...@mysimpatico.com> wrote:
> Hello,
>
> This time I'm trying to duplicate Luke's functionality of knowing which
> terms occur in a search result/document (w/o parsing it again). Any Solrj
> API to do that?
>
> P.S. I've also posted the question on
> SO<http://stackoverflow.com/q/7219111/300248>
> .
>
> On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout
> <gabri...@mysimpatico.com>wrote:
>
>> From you patch I see TermFreqVector  which provides the information I
>> want.
>>
>> I also found FieldInvertState.getLength() which seems to be exactly what I
>> want. I'm after the word count (sum of tf for every term in the doc). I'm
>> just not sure whether FieldInvertState.getLength() returns just the number
>> of terms (not multiplied by the frequency of each term - word count) or not
>> though. It seems as if it returns word count, but I've not tested it
>> sufficienctly.
>>
>>
>> On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger 
>> <the.apache.t...@gmail.com>wrote:
>>
>>> Gabriele,
>>>
>>> I created a patch that does this about a year ago.  See
>>> https://issues.apache.org/jira/browse/SOLR-1837.  It was written for Solr
>>> 1.4 and is based upon the Document Reconstructor in Luke.  The patch adds
>>> a
>>> link to the main solr admin page to a docinspector page which will
>>> reconstruct the document given a uniqueid (required).  Keep in mind that
>>> you're only looking at what's "in" the index for non-stored fields, not
>>> the
>>> original text.
>>>
>>> If you have any issues using this on the most recent release, let me know
>>> and I'd be happy to create a new patch for solr 3.3.  One of these days
>>> I'll
>>> remove the JSP dependency and this may eventually making it into trunk.
>>>
>>> Thanks,
>>>
>>> -Trey Grainger
>>> Search Technology Development Team Lead, Careerbuilder.com
>>> Site Architect, Celiaccess.com
>>>
>>>
>>> On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout
>>> <gabri...@mysimpatico.com>wrote:
>>>
>>> > Hello,
>>> >
>>> > With an inverted index the term is the key, and the documents are the
>>> > values. Is it still however possible that given a document id I get the
>>> > terms indexed for that document?
>>> >
>>> > --
>>> > Regards,
>>> > K. Gabriele
>>> >
>>> > --- unchanged since 20/9/10 ---
>>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>>> > receipt within 48 hours then I don't resend the email.
>>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>>> > time(x)
>>> > < Now + 48h) ⇒ ¬resend(I, this).
>>> >
>>> > If an email is sent by a sender that is not a trusted contact or the
>>> email
>>> > does not contain a valid code then the email is not received. A valid
>>> code
>>> > starts with a hyphen and ends with "X".
>>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>>> > L(-[a-z]+[0-9]X)).
>>> >
>>>
>>
>>
>>
>> --
>> Regards,
>> K. Gabriele
>>
>> --- unchanged since 20/9/10 ---
>> P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> receipt within 48 hours then I don't resend the email.
>> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>>
>> If an email is sent by a sender that is not a trusted contact or the email
>> does not contain a valid code then the email is not received. A valid code
>> starts with a hyphen and ends with "X".
>> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> L(-[a-z]+[0-9]X)).
>>
>>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Reply via email to