Re: How do I this in Solr?
I haven't been able to work on it because of some other commitments. The MemoryIndex approach seems promising. Only thing I will have to check is the memory requirement as I have close to 2 million documents. Will let you know if I can make it work. Thanks a lot! -- Varun Gupta On Sat, Nov 6, 2010 at 3:48 AM, Steven A Rowe sar...@syr.edu wrote: Hi Varun, On 10/26/2010 at 11:26 PM, Varun Gupta wrote: I will try to implement the two filters suggested by Steven and see how the performance matches up. Have you made any progress? I was thinking about your use case, and it occurred to me that you could get what you want by reversing the problem, using Lucene's MemoryIndex http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html. (As far as I can tell, this functionality -- i.e. standing queries a.k.a. routing a.k.a. filtering -- is not present in Solr.) You can load your query (as a document) into a MemoryIndex, and then use each of your documents to query against it, something like (untested!): MapString,Query documents = new HashMapString,Query(); Analyzer analyzer = new WhitespaceAnalyzer(); QueryParser parser = new QueryParser(content, analyzer); parser.setDefaultOperator(QueryParser.Operator.AND); documents.put(ID001, parser.parse(nokia n95)); documents.put(ID002, parser.parse(GPS)); documents.put(ID003, parser.parse(android)); documents.put(ID004, parser.parse(samsung)); documents.put(ID005, parser.parse(samsung android)); documents.put(ID006, parser.parse(nokia android)); documents.put(ID007, parser.parse(mobile with GPS)); MemoryIndex index = new MemoryIndex(); index.addField(content, samsung with GPS, analyzer); for (Map.EntryString,Query entry : documents.entrySet()) { Query query = entry.getValue(); if (index.search(query) 0.0f) { String docId = entry.getKey(); // Do something with the hits here ... } } In the above example, the documents samsung, GPS, android and samsung android would be hits, and the other documents would not be, just as you wanted. MemoryIndex is designed to be very fast for this kind of usage, so even 100's of thousands of documents should be feasible. Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
Hi Varun, On 10/26/2010 at 11:26 PM, Varun Gupta wrote: I will try to implement the two filters suggested by Steven and see how the performance matches up. Have you made any progress? I was thinking about your use case, and it occurred to me that you could get what you want by reversing the problem, using Lucene's MemoryIndex http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html. (As far as I can tell, this functionality -- i.e. standing queries a.k.a. routing a.k.a. filtering -- is not present in Solr.) You can load your query (as a document) into a MemoryIndex, and then use each of your documents to query against it, something like (untested!): MapString,Query documents = new HashMapString,Query(); Analyzer analyzer = new WhitespaceAnalyzer(); QueryParser parser = new QueryParser(content, analyzer); parser.setDefaultOperator(QueryParser.Operator.AND); documents.put(ID001, parser.parse(nokia n95)); documents.put(ID002, parser.parse(GPS)); documents.put(ID003, parser.parse(android)); documents.put(ID004, parser.parse(samsung)); documents.put(ID005, parser.parse(samsung android)); documents.put(ID006, parser.parse(nokia android)); documents.put(ID007, parser.parse(mobile with GPS)); MemoryIndex index = new MemoryIndex(); index.addField(content, samsung with GPS, analyzer); for (Map.EntryString,Query entry : documents.entrySet()) { Query query = entry.getValue(); if (index.search(query) 0.0f) { String docId = entry.getKey(); // Do something with the hits here ... } } In the above example, the documents samsung, GPS, android and samsung android would be hits, and the other documents would not be, just as you wanted. MemoryIndex is designed to be very fast for this kind of usage, so even 100's of thousands of documents should be feasible. Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
Re: How do I this in Solr?
There is also a feature called a 'filter'. If you use certain words a lot, you can make filter queries with just those words. Look for 'filter' and 'fq=' on the wiki. But really you can have hundreds of words in a query and not have a performance problem. Solr/Lucene is very fast. In benchmarking I have trouble sending enough requests to make several processors run at the same time. Varun Gupta wrote: Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta -- -- %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
RE: How do I this in Solr?
I'm pretty sure the word-count strategy won't work. If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Using the word-count strategy, a document containing samsung andriod PDQ would be a hit, but Varun doesn't want it, because it contains a word that is not in the query. Steve -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, October 27, 2010 7:44 AM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta -- -- %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
Re: How do I this in Solr?
Right - my point was to combine this with the previous approaches to form a query like: samsung AND android AND GPS AND word_count:3 in order to exclude documents containing additional words. This would avoid the combinatoric explosion problem otehrs had alluded to earlier. Of course this would fail because android is mis- spelled :) -Mike On 10/27/2010 08:45 AM, Steven A Rowe wrote: I'm pretty sure the word-count strategy won't work. If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Using the word-count strategy, a document containing samsung andriod PDQ would be a hit, but Varun doesn't want it, because it contains a word that is not in the query. Steve -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, October 27, 2010 7:44 AM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To:solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta -- -- %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
Re: How do I this in Solr?
That does not work either as it requires that all the terms in the query are present in the document. The original poster did not state this requirement. On the contrary, his examples were mostly single-word matches, implying an OR-search at the core. The query-explosion still seems like the only working idea. Maybe Varun could comment on the maximum numbers of terms that his queries will contain? Regards, Toke Eskildsen On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote: Right - my point was to combine this with the previous approaches to form a query like: samsung AND android AND GPS AND word_count:3 in order to exclude documents containing additional words. This would avoid the combinatoric explosion problem otehrs had alluded to earlier. Of course this would fail because android is mis- spelled :) -Mike On 10/27/2010 08:45 AM, Steven A Rowe wrote: I'm pretty sure the word-count strategy won't work. If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Using the word-count strategy, a document containing samsung andriod PDQ would be a hit, but Varun doesn't want it, because it contains a word that is not in the query. Steve -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, October 27, 2010 7:44 AM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To:solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta -- -- %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
Re: How do I this in Solr?
Yes I missed that requirement (as Steven also pointed out in a private e-mail). I now agree that the combinatorics are required. Another possibility to consider (if the queries are large, which actually seems unlikely) is to use the default behavior where all terms are optional, sort by relevance, and truncate the result list on the client side after some unwanted term is found. I *think* the scoring should find only docs with the searched-for terms first, although if there are a lot of repeated terms maybe not? Also result counts will be screwy. -Mike On 10/27/2010 09:34 AM, Toke Eskildsen wrote: That does not work either as it requires that all the terms in the query are present in the document. The original poster did not state this requirement. On the contrary, his examples were mostly single-word matches, implying an OR-search at the core. The query-explosion still seems like the only working idea. Maybe Varun could comment on the maximum numbers of terms that his queries will contain? Regards, Toke Eskildsen On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote: Right - my point was to combine this with the previous approaches to form a query like: samsung AND android AND GPS AND word_count:3 in order to exclude documents containing additional words. This would avoid the combinatoric explosion problem otehrs had alluded to earlier. Of course this would fail because android is mis- spelled :) -Mike On 10/27/2010 08:45 AM, Steven A Rowe wrote: I'm pretty sure the word-count strategy won't work. If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Using the word-count strategy, a document containing samsung andriod PDQ would be a hit, but Varun doesn't want it, because it contains a word that is not in the query. Steve -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, October 27, 2010 7:44 AM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To:solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
Re: How do I this in Solr?
Toke, the search query will contain 4-5 words on an average (excluding the stopwords). Mike, I don't care about the result count. Excluding the terms at the client side may be a good idea. Is there any way to alter scoring such that the docs containing only the searched-for terms are shown first? Can I use term frequency to do such kind of thing? -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 7:13 PM, Mike Sokolov soko...@ifactory.com wrote: Yes I missed that requirement (as Steven also pointed out in a private e-mail). I now agree that the combinatorics are required. Another possibility to consider (if the queries are large, which actually seems unlikely) is to use the default behavior where all terms are optional, sort by relevance, and truncate the result list on the client side after some unwanted term is found. I *think* the scoring should find only docs with the searched-for terms first, although if there are a lot of repeated terms maybe not? Also result counts will be screwy. -Mike On 10/27/2010 09:34 AM, Toke Eskildsen wrote: That does not work either as it requires that all the terms in the query are present in the document. The original poster did not state this requirement. On the contrary, his examples were mostly single-word matches, implying an OR-search at the core. The query-explosion still seems like the only working idea. Maybe Varun could comment on the maximum numbers of terms that his queries will contain? Regards, Toke Eskildsen On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote: Right - my point was to combine this with the previous approaches to form a query like: samsung AND android AND GPS AND word_count:3 in order to exclude documents containing additional words. This would avoid the combinatoric explosion problem otehrs had alluded to earlier. Of course this would fail because android is mis- spelled :) -Mike On 10/27/2010 08:45 AM, Steven A Rowe wrote: I'm pretty sure the word-count strategy won't work. If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Using the word-count strategy, a document containing samsung andriod PDQ would be a hit, but Varun doesn't want it, because it contains a word that is not in the query. Steve -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, October 27, 2010 7:44 AM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 11:26 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To:solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia
How do I this in Solr?
Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
Re: How do I this in Solr?
If I get your question right, you probably want to use the AND binary operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS On 26 October 2010 14:07, Varun Gupta varun.vgu...@gmail.com wrote: Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
Hi Varun, I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
Re: How do I this in Solr?
On Tue, Oct 26, 2010 at 9:15 AM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: If I get your question right, you probably want to use the AND binary operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS N.b. For these queries you can also pass the q.op parameter in the request to temporarily change the default operator to AND; this has the same effect without having to build the query; i.e., you can just pass http://host:port/solr/select?q=samsung+android+gpsq.op=and; as the query string (along with any other params you need).
RE: How do I this in Solr?
Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
Re: How do I this in Solr?
Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. Is this correct? If so, it wouldn't match his requirements. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta
RE: How do I this in Solr?
Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word overkill response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your overkill statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: mobile with GPS - GPS mobile with samsung android - android samsung 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: samsung andriod GPS - samsung,android,GPS, android samsung,GPS samsung,android GPS android GPS samsung Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents
RE: How do I this in Solr?
I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:27 PM Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word overkill response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your overkill statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted
RE: How do I this in Solr?
Dennis, I wasn't trying to force your admission of my rectitude - I was just getting frustrated that the conversation was moving in spiral fashion, and was worried that you might have intentionally engineered that. I'm glad to hear that you weren't flame baiting. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:35 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:27 PM Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word overkill response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your overkill statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote: From: Steven A Rowe sar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How
Re: How do I this in Solr?
Indeed, I'd missed the second part of his requirements, my and solution is sadly insufficient to this task. The combinatorial part of you solution worries me a bit though Steven, because his documents that are on the larger side of his corpus would likely slow down query performance a bit while the filter calculates all of the possibilities for a given document. I'm wondering if a slightly hybrid approach would be valid: Have a filter that calculates the total number of terms for a given document. And then add a clause into your query at runtime that would match what the filter would come up with: So: text:Nokia AND text:Mobile AND text:GPS AND termCount: 3 Something like that anyhow. Matt On 10/26/2010 3:35 PM, Dennis Gearon wrote: I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:27 PM Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word overkill response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your overkill statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote
Re: How do I this in Solr?
Bah.. nope this would miss documents that only match a subset of the given terms. I'm going to have to go with Steven's approach as the right choice here. Matt On 10/26/2010 3:44 PM, Matthew Hall wrote: Indeed, I'd missed the second part of his requirements, my and solution is sadly insufficient to this task. The combinatorial part of you solution worries me a bit though Steven, because his documents that are on the larger side of his corpus would likely slow down query performance a bit while the filter calculates all of the possibilities for a given document. I'm wondering if a slightly hybrid approach would be valid: Have a filter that calculates the total number of terms for a given document. And then add a clause into your query at runtime that would match what the filter would come up with: So: text:Nokia AND text:Mobile AND text:GPS AND termCount: 3 Something like that anyhow. Matt On 10/26/2010 3:35 PM, Dennis Gearon wrote: I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:27 PM Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word overkill response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your overkill statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr
RE: How do I this in Solr?
Hi Matt, I think your concern about performance is spot-on, though. The combinatorial explosion would be at query time, not at index time - my solution has a single token indexed per document. My suggested query-time filter would generate the following number of output terms, where C(n,k) is the combination of n things taken k at a time, n is the number of input query terms, and k is the number of concatenated input query terms forming one output query term: C(n,1)+C(n,2)...+C(n,n-1)+C(n,n) For small queries this would not be a problem: 1 input query term - 1 output query term 2 input query terms - 3 output query terms 3 input query terms - 7 output query terms 4 input query terms - 15 output query terms But for larger queries, it could be fairly expensive: 10 input query terms - 1,023 output query terms ... 15 input query terms - 32,767 output query terms This is exactly (2^n - 1) output query terms, where n is the number of input terms. 32k query terms might be too slow to be functional. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 3:51 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Bah.. nope this would miss documents that only match a subset of the given terms. I'm going to have to go with Steven's approach as the right choice here. Matt On 10/26/2010 3:44 PM, Matthew Hall wrote: Indeed, I'd missed the second part of his requirements, my and solution is sadly insufficient to this task. The combinatorial part of you solution worries me a bit though Steven, because his documents that are on the larger side of his corpus would likely slow down query performance a bit while the filter calculates all of the possibilities for a given document. I'm wondering if a slightly hybrid approach would be valid: Have a filter that calculates the total number of terms for a given document. And then add a clause into your query at runtime that would match what the filter would come up with: So: text:Nokia AND text:Mobile AND text:GPS AND termCount: 3 Something like that anyhow. Matt On 10/26/2010 3:35 PM, Dennis Gearon wrote: I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:27 PM Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word overkill response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your overkill statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu wrote: From: Steven A Rowesar...@syr.edu Subject: RE: How do I this in Solr? To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say
Re: How do I this in Solr?
I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
Re: How do I this in Solr?
Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) scott@udngroup.comwrote: I think you have to write a yet exact match handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively only returns those results that has equal string length as the query string's. For example, giving: *query string = Samsung with GPS *query results: resutl 1 = Samsung has lots of mobile with GPS result 2 = with GPS Samsng result 3 = GPS mobile with vendors, such as Sony, Samsung they become: *query result = SamsungwithGPS (length =14) *query results: resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = withGPSSamsng (length =14) result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: Varun Gupta varun.vgu...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria All of the words of the search result document are present in the search query For example: If I have the following documents indexed: nokia n95, GPS, android, samsung, samsung andriod, nokia andriod, mobile with GPS If I search with the text samsung andriod GPS, search results should only conain samsung, GPS, andriod and samsung andriod. Is there a way to do this in Solr. -- Thanks Varun Gupta %b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
Re: How do I get the solr error response as XML instead of HTML
: solr errors come back as HTML instead of XM or JSON : : Is it possible to get the response to come back as XML or JSON, or at : least something I could show to an end user? At the moment, Solr just relies on the Servlet Container to generate the error response, so you'd have to customize it at that level to get it formatted in XML or JSON. There is an open issue to make Solr generate the error responses directly so the ResponseWriters could format them (SOLR-141) but there hasn't been a lot of demand for it. -Hoss
How do I get the solr error response as XML instead of HTML
solr errors come back as HTML instead of XM or JSON Is it possible to get the response to come back as XML or JSON, or at least something I could show to an end user? Is there a way to tell solr to ignore unparseable terms and still return a result, ideally with a warning so the end user doesn't get an error page. GET 'http://localhost:8983/solr/select/?q=term+ORwt=xml' html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 400 org.apache.lucene.queryParser.ParseException: Cannot parse 'term OR': Encountered lt;EOFgt; at line 1, column 7. Was expecting one of: lt;NOTgt; ... + ... - ... ( ... * ... lt;QUOTEDgt; ... lt;TERMgt; ... lt;PREFIXTERMgt; ... lt;WILDTERMgt; ... lt;REGEXPTERMgt; ... [ ... { ... lt;NUMBERgt; ... lt;TERMgt; ... * ... /title /head bodyh2HTTP ERROR 400/h2 pProblem accessing /solr/select/. Reason: preorg.apache.lucene.queryParser.ParseException: Cannot parse 'term OR': Encountered lt;EOFgt; at line 1, column 7. Was expecting one of: lt;NOTgt; ... + ... - ... ( ... * ... lt;QUOTEDgt; ... lt;TERMgt; ... lt;PREFIXTERMgt; ... lt;WILDTERMgt; ... lt;REGEXPTERMgt; ... [ ... { ... lt;NUMBERgt; ... lt;TERMgt; ... * ... /pre/phr /ismallPowered by Jetty:///small/ibr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html
Re: How do I get the solr error response as XML instead of HTML
Scott, Regarding unparseable terms - I think even edismaxc query parser is more forgiving that the standard one, but if that is not the case, one can always build a custom query parser that is more forgiving regarding invalid query string syntax. Re HTML response - I'm guessing you are seeing something that looks like HTML to you in a browser. It should be XML. If it is not, please show us what you are seeing. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Scott K s...@skister.com To: solr-user@lucene.apache.org Sent: Thu, October 7, 2010 8:07:35 PM Subject: How do I get the solr error response as XML instead of HTML solr errors come back as HTML instead of XM or JSON Is it possible to get the response to come back as XML or JSON, or at least something I could show to an end user? Is there a way to tell solr to ignore unparseable terms and still return a result, ideally with a warning so the end user doesn't get an error page. GET 'http://localhost:8983/solr/select/?q=term+ORwt=xml' html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 400 org.apache.lucene.queryParser.ParseException: Cannot parse 'term OR': Encountered lt;EOFgt; at line 1, column 7. Was expecting one of: lt;NOTgt; ... + ... - ... ( ... * ... lt;QUOTEDgt; ... lt;TERMgt; ... lt;PREFIXTERMgt; ... lt;WILDTERMgt; ... lt;REGEXPTERMgt; ... [ ... { ... lt;NUMBERgt; ... lt;TERMgt; ... * ... /title /head bodyh2HTTP ERROR 400/h2 pProblem accessing /solr/select/. Reason: pre org.apache.lucene.queryParser.ParseException: Cannot parse 'term OR': Encountered lt;EOFgt; at line 1, column 7. Was expecting one of: lt;NOTgt; ... + ... - ... ( ... * ... lt;QUOTEDgt; ... lt;TERMgt; ... lt;PREFIXTERMgt; ... lt;WILDTERMgt; ... lt;REGEXPTERMgt; ... [ ... { ... lt;NUMBERgt; ... lt;TERMgt; ... * ... /pre/phr /ismallPowered by Jetty:///small/ibr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html
Re: How do I create a solr core with the data from an existing one?
: *Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy : of the core, and then swapping it in for the main core. I tried following ... : The problem I am having is, the core created in step 1 doesn't have any data : in it. If I am going to do a full index of everything and the kitchen sink, : that would be fine, but if I just want to update a (large) subset of the : documents - that's obviously not going to work. that's really the point of that recommendation -- it's a way to compleltey rebuild without any downtime (the old core keeps serving requests until the new one is completely ready) If you are just updating some of the docs (even if it's a large some) you should just updating hte existing core. if you really want to clone the data in a core, then replication is really the only way to do that currently. Replicating to a query machine instead of having clients query the master you are updating directly is usually a good idea for lots of reasons -- but in this case you could always temporarily disable replication, make your large batch changes to the master, and then renable the replciation so the query boxes only see the changes when they are all done. -Hoss
How do I create a solr core with the data from an existing one?
*Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy of the core, and then swapping it in for the main core. I tried following these steps: 1. Create prep core: http://localhost:8983/solr/admin/cores?action=CREATEname=prepinstanceDir=main 2. Perform index update, then commit/optimize on prep core. 3. Swap main and prep core: http://localhost:8983/solr/admin/cores?action=SWAPcore=mainother=prep 4. Unload prep core: http://localhost:8983/solr/admin/cores?action=UNLOADcore=prep The problem I am having is, the core created in step 1 doesn't have any data in it. If I am going to do a full index of everything and the kitchen sink, that would be fine, but if I just want to update a (large) subset of the documents - that's obviously not going to work. (I could merge the cores, but part of what I'm trying to do is get rid of any deleted documents without trying to make a list of them.) Is there some flag to the CREATE action that I'm missing? The Solr Wiki page for CoreAdmin http://wiki.apache.org/solr/CoreAdmin is a little sparse on details. Is this approach wrong? I found at least one message on this list that stated that performing updates in a separate core on the same machine won't help, given that they're both using the same CPU. Is that true? thanks in advance ~stannius
Re: How do I create a solr core with the data from an existing one?
Hi, I'm not sure if it's the best option but you could use replication to copy the index (http://wiki.apache.org/solr/SolrReplication). As long as you core is configured as a master you can use the fetchindex command to do a one-time replication from the new core (see the HTTP API section in the wiki page). Regards, gwk On 3/24/2010 5:31 PM, Steve Dupree wrote: *Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy of the core, and then swapping it in for the main core. I tried following these steps: 1. Create prep core: http://localhost:8983/solr/admin/cores?action=CREATEname=prepinstanceDir=main 2. Perform index update, then commit/optimize on prep core. 3. Swap main and prep core: http://localhost:8983/solr/admin/cores?action=SWAPcore=mainother=prep 4. Unload prep core: http://localhost:8983/solr/admin/cores?action=UNLOADcore=prep The problem I am having is, the core created in step 1 doesn't have any data in it. If I am going to do a full index of everything and the kitchen sink, that would be fine, but if I just want to update a (large) subset of the documents - that's obviously not going to work. (I could merge the cores, but part of what I'm trying to do is get rid of any deleted documents without trying to make a list of them.) Is there some flag to the CREATE action that I'm missing? The Solr Wiki page for CoreAdminhttp://wiki.apache.org/solr/CoreAdmin is a little sparse on details. Is this approach wrong? I found at least one message on this list that stated that performing updates in a separate core on the same machine won't help, given that they're both using the same CPU. Is that true? thanks in advance ~stannius
Question: How do I run the solr analysis tool programtically ?
Form java code I want to contact solr through Http and supply a text buffer (or a url that returns text, whatever is easier) and I want to get in return the final list of tokens (or the final text buffer) after it went through all the query time filters defined for this solr instance (stemming, stop words etc) thanks in advance -- View this message in context: http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question: How do I run the solr analysis tool programtically ?
Hi Yatir, The FieldAnalysisRequestHandler has the same behavior as the analysis tool. It will show you the list of tokens that are created after each of the filters have been applied. It can be used through normal HTTP requests, or you can use SolrJ's support. Thanks, Chris On Thu, Sep 3, 2009 at 12:42 PM, Yatir yat...@outbrain.com wrote: Form java code I want to contact solr through Http and supply a text buffer (or a url that returns text, whatever is easier) and I want to get in return the final list of tokens (or the final text buffer) after it went through all the query time filters defined for this solr instance (stemming, stop words etc) thanks in advance -- View this message in context: http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html Sent from the Solr - User mailing list archive at Nabble.com.