Re: Tf-Idf for a specific query
Hi Erick, Slower queries for getting facets can be tolerated, as long as they don't affect those without facets. The requirement is for a separate query which can get me both term vector and facet counts. One issue I am facing is that, for a search query I only want the term vectors and facet counts, but not the results/docs. If I set the rows=0, then term vectors are not returned. Could you suggest some way to achieve the above. Also it will be helpful to get a way to get aggregate TF of a term (across all docs in the query). Regards, David On Sat, Feb 8, 2014 at 10:49 AM, Erick Erickson erickerick...@gmail.comwrote: David: If you're, say, faceting on fields with lots of unique values, this will be quite expensive. No idea whether you can tolerate slower queries or not, just sayin' Erick On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com wrote: Thanks Mikhai, It seems that, this was what I was looking for. Being new to this, I wasn't aware of such a use of facets. Now I can probably combine the term vectors and facets to fit my scenario. Regards, Dave On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: David, I can imagine that DF for resultset is facets! On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com wrote: Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent(invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Tf-Idf for a specific query
David: If you're, say, faceting on fields with lots of unique values, this will be quite expensive. No idea whether you can tolerate slower queries or not, just sayin' Erick On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com wrote: Thanks Mikhai, It seems that, this was what I was looking for. Being new to this, I wasn't aware of such a use of facets. Now I can probably combine the term vectors and facets to fit my scenario. Regards, Dave On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: David, I can imagine that DF for resultset is facets! On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com wrote: Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Tf-Idf for a specific query
Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Tf-Idf for a specific query
David, I can imagine that DF for resultset is facets! On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.comwrote: Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Tf-Idf for a specific query
Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Tf-Idf for a specific query
Thanks Mikhai, It seems that, this was what I was looking for. Being new to this, I wasn't aware of such a use of facets. Now I can probably combine the term vectors and facets to fit my scenario. Regards, Dave On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: David, I can imagine that DF for resultset is facets! On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com wrote: Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com