Re: Tf-Idf for a specific query

2014-02-11 Thread David Miller
Hi Erick,

Slower queries for getting facets can be tolerated, as long as they don't
affect those without facets. The requirement is for a separate query which
can get me both term vector and facet counts.

One issue I am facing is that, for a search query I only want the term
vectors and facet counts, but not the results/docs. If I set the rows=0,
then term vectors are not returned. Could you suggest some way to achieve
the above.

Also it will be helpful to get a way to get aggregate TF of a term (across
all docs in the query).

Regards,
David






On Sat, Feb 8, 2014 at 10:49 AM, Erick Erickson erickerick...@gmail.comwrote:

 David:

 If you're, say, faceting on fields with lots of unique values, this
 will be quite expensive.
 No idea whether you can tolerate slower queries or not, just sayin'

 Erick

 On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com
 wrote:
  Thanks Mikhai,
 
  It seems that, this was what I was looking for. Being new to this, I
 wasn't
  aware of such a use of facets.
 
  Now I can probably combine the term vectors and facets to fit my
 scenario.
 
  Regards,
  Dave
 
 
  On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:
 
  David,
 
  I can imagine that DF for resultset is facets!
 
 
  On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com
  wrote:
 
   Hi Mikhail,
  
   The DF seems to be based on the entire document set. What I require is
   based on a the results of a single query.
  
   Suppose my Solr query returns a set of 50K documents from a superset
 of
   10Million documents, I require to calculate the DF just based on the
 50K
   documents. But currently it seems to be calculated on the entire doc
 set.
  
   So, is there any way to get the DF or IDF just on basis of the docs
   returned by the query?
  
   Regards,
   Dave
  
  
  
  
  
  
  
   On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
wrote:
  
Hello Dave
you can get DF from http://wiki.apache.org/solr/TermsComponent(invert
   it
yourself)
then, for certain term you can get number of occurrences per
 document
  by
http://wiki.apache.org/solr/FunctionQuery#tf
   
   
   
On Fri, Feb 7, 2014 at 3:58 AM, David Miller 
 davthehac...@gmail.com
wrote:
   
 Hi Guys..

 I require to obtain Tf-idf score from Solr for a certain set of
documents.
 But the catch is that, I needs the IDF (or DF) to be calculated on
  the
 documents returned by the specific query and not the entire
 corpus.

 Please provide me some hint on whether Solr has this feature or
 if I
   can
 use the Lucene Api directly to achieve this.


 Thanks in advance,
 Dave

   
   
   
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
   
http://www.griddynamics.com
 mkhlud...@griddynamics.com
   
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com
 



Re: Tf-Idf for a specific query

2014-02-08 Thread Erick Erickson
David:

If you're, say, faceting on fields with lots of unique values, this
will be quite expensive.
No idea whether you can tolerate slower queries or not, just sayin'

Erick

On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com wrote:
 Thanks Mikhai,

 It seems that, this was what I was looking for. Being new to this, I wasn't
 aware of such a use of facets.

 Now I can probably combine the term vectors and facets to fit my scenario.

 Regards,
 Dave


 On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 David,

 I can imagine that DF for resultset is facets!


 On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com
 wrote:

  Hi Mikhail,
 
  The DF seems to be based on the entire document set. What I require is
  based on a the results of a single query.
 
  Suppose my Solr query returns a set of 50K documents from a superset of
  10Million documents, I require to calculate the DF just based on the 50K
  documents. But currently it seems to be calculated on the entire doc set.
 
  So, is there any way to get the DF or IDF just on basis of the docs
  returned by the query?
 
  Regards,
  Dave
 
 
 
 
 
 
 
  On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
  mkhlud...@griddynamics.com
   wrote:
 
   Hello Dave
   you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
  it
   yourself)
   then, for certain term you can get number of occurrences per document
 by
   http://wiki.apache.org/solr/FunctionQuery#tf
  
  
  
   On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
   wrote:
  
Hi Guys..
   
I require to obtain Tf-idf score from Solr for a certain set of
   documents.
But the catch is that, I needs the IDF (or DF) to be calculated on
 the
documents returned by the specific query and not the entire corpus.
   
Please provide me some hint on whether Solr has this feature or if I
  can
use the Lucene Api directly to achieve this.
   
   
Thanks in advance,
Dave
   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: Tf-Idf for a specific query

2014-02-07 Thread Mikhail Khludnev
Hello Dave
you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it
yourself)
then, for certain term you can get number of occurrences per document by
http://wiki.apache.org/solr/FunctionQuery#tf



On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote:

 Hi Guys..

 I require to obtain Tf-idf score from Solr for a certain set of documents.
 But the catch is that, I needs the IDF (or DF) to be calculated on the
 documents returned by the specific query and not the entire corpus.

 Please provide me some hint on whether Solr has this feature or if I can
 use the Lucene Api directly to achieve this.


 Thanks in advance,
 Dave




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Tf-Idf for a specific query

2014-02-07 Thread Mikhail Khludnev
David,

I can imagine that DF for resultset is facets!


On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.comwrote:

 Hi Mikhail,

 The DF seems to be based on the entire document set. What I require is
 based on a the results of a single query.

 Suppose my Solr query returns a set of 50K documents from a superset of
 10Million documents, I require to calculate the DF just based on the 50K
 documents. But currently it seems to be calculated on the entire doc set.

 So, is there any way to get the DF or IDF just on basis of the docs
 returned by the query?

 Regards,
 Dave







 On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:

  Hello Dave
  you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
 it
  yourself)
  then, for certain term you can get number of occurrences per document by
  http://wiki.apache.org/solr/FunctionQuery#tf
 
 
 
  On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
  wrote:
 
   Hi Guys..
  
   I require to obtain Tf-idf score from Solr for a certain set of
  documents.
   But the catch is that, I needs the IDF (or DF) to be calculated on the
   documents returned by the specific query and not the entire corpus.
  
   Please provide me some hint on whether Solr has this feature or if I
 can
   use the Lucene Api directly to achieve this.
  
  
   Thanks in advance,
   Dave
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Tf-Idf for a specific query

2014-02-07 Thread David Miller
Hi Mikhail,

The DF seems to be based on the entire document set. What I require is
based on a the results of a single query.

Suppose my Solr query returns a set of 50K documents from a superset of
10Million documents, I require to calculate the DF just based on the 50K
documents. But currently it seems to be calculated on the entire doc set.

So, is there any way to get the DF or IDF just on basis of the docs
returned by the query?

Regards,
Dave







On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 Hello Dave
 you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it
 yourself)
 then, for certain term you can get number of occurrences per document by
 http://wiki.apache.org/solr/FunctionQuery#tf



 On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
 wrote:

  Hi Guys..
 
  I require to obtain Tf-idf score from Solr for a certain set of
 documents.
  But the catch is that, I needs the IDF (or DF) to be calculated on the
  documents returned by the specific query and not the entire corpus.
 
  Please provide me some hint on whether Solr has this feature or if I can
  use the Lucene Api directly to achieve this.
 
 
  Thanks in advance,
  Dave
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: Tf-Idf for a specific query

2014-02-07 Thread David Miller
Thanks Mikhai,

It seems that, this was what I was looking for. Being new to this, I wasn't
aware of such a use of facets.

Now I can probably combine the term vectors and facets to fit my scenario.

Regards,
Dave


On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 David,

 I can imagine that DF for resultset is facets!


 On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com
 wrote:

  Hi Mikhail,
 
  The DF seems to be based on the entire document set. What I require is
  based on a the results of a single query.
 
  Suppose my Solr query returns a set of 50K documents from a superset of
  10Million documents, I require to calculate the DF just based on the 50K
  documents. But currently it seems to be calculated on the entire doc set.
 
  So, is there any way to get the DF or IDF just on basis of the docs
  returned by the query?
 
  Regards,
  Dave
 
 
 
 
 
 
 
  On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
  mkhlud...@griddynamics.com
   wrote:
 
   Hello Dave
   you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
  it
   yourself)
   then, for certain term you can get number of occurrences per document
 by
   http://wiki.apache.org/solr/FunctionQuery#tf
  
  
  
   On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
   wrote:
  
Hi Guys..
   
I require to obtain Tf-idf score from Solr for a certain set of
   documents.
But the catch is that, I needs the IDF (or DF) to be calculated on
 the
documents returned by the specific query and not the entire corpus.
   
Please provide me some hint on whether Solr has this feature or if I
  can
use the Lucene Api directly to achieve this.
   
   
Thanks in advance,
Dave
   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com