Re: help implementing a couple of business rules
For your first question, wouldn't it be possible to achieve that with some simple boolean logic? I mean, if you have a requirement to match any of the other fields AND description2, but not if it ONLY matches description 2: say matching x against field A, B, and description 2: ((A:x OR B:x) AND description2:x) would only give you results from description2 IF there is also a match in either one of the other two fields. If I misunderstood your requirements, you should also note that solr supports pure negative field matching aswell, meaning that you CAN exclude results from a specific field entirely. From the wiki: Pure negative queries (all clauses prohibited) are allowed. -inStock:falsefinds all field values where inStock is not false Hope that helps, Aleks On Mon, Jan 11, 2010 at 7:29 PM, Joe Calderon calderon@gmail.comwrote: thx, but im not sure that covers all edge cases, to clarify 1. matching description2 is okay if other fields are matched too, but results matching only to description2 should be omitted 2. its okay to not match against the people field, but matches against the people field should only be phrase matches sorry if i was unclear --joe On Mon, Jan 11, 2010 at 10:13 AM, Erik Hatcher erik.hatc...@gmail.com wrote: On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote: 1. given a set of fields how to return matches that match across them but not just one specific one, ex im using a dismax parser currently but i want to exclude any results that only match against a field called 'description2' One way could be to add an fq parameter to the request: fq=-description2:(query) 2. given a set of fields how to return matches that match across them but on one specific field match as a phrase only, ex im using a dismax parser currently but i want matches against a field called 'people' to only match as a phrase Doesn't setting pf=people accomplish this? Erik
Re: Yankee's Solr integration
They have probably added the logic for that server-side. Solr does not support these type of features, but they are easy to implement. Saving a search could be as easy as storing the selected query parameters. Then creating an alert (or RSS feed) for that would be a process on the server that executes those stored queries agains solr at regular intervals, and formats the results as either RSS or an email then ships that off to the client that subscribed. Cheers, Aleks On Wed, Jan 6, 2010 at 3:12 PM, Nicolas Kern nico...@nicolaskern.fr wrote: Hello everybody, I was wordering how did Yankee ( http://www.yankeegroup.com/search.do?searchType=advancedSearch) did to provide the possibility to Create Alerts, Save Searches, and generate a RSS Feed out of a custom search using Solr, do you have any idea ? Thanks a lot, Best regards happy new year ! Nicolas
Re: Facets and distributed search
Hi Yonik! I've tried recreating the problem now to get some log-output and the problem just doesn't seem to be there anymore... This puzzles me abit, as the problem WAS definitely there before. I've done one change and that is to optimize the index on one of the servers. But should that impact this to such a significant extent? The other thing I noticed was that I had set facet.mincount=0, which is obviously stupid in this case and might just be the problem here. Changing it to mincount=1 made all queries fast again:) Sorry for the stupid inquiry, I'll be sure to check my tests two or three times before posting similar issues again! Cheers, Aleks On Mon, Jan 4, 2010 at 5:26 PM, Yonik Seeley yo...@lucidimagination.comwrote: Something looks wrong... that type of slowdown is certainly not expected. You should be able to see both the main query and a sub-query in the logs... could you post an actual example? -Yonik http://www.lucidimagination.com On Mon, Jan 4, 2010 at 4:15 AM, Aleksander Stensby aleksander.sten...@integrasco.com wrote: Hi everyone! I've posted a similar question earlier, but in a thread related to facets in general, so I thought I'd repost it here as a separate thread. I have a faceted search that is very fast when I executed the query on a single solr server, but is significantly slower when executed in a distributed environment. The set-back seem to be in the sharding of our data.. And that puzzles me a little bit... I can't really see why SOLR is so slow at doing this. The scenario: Let's say we have two servers (s1 and s2). If i query the following: q=threadid:33facet=truefacet.field=authorlimit=-1facet.mincount=0rows=0 directly on either server, the response is lightning fast. (10ms) So, in theory I could query them directly, concat the result myself and get that done pretty fast. But if I introduce the shards parameter, the response time booms to between 15000ms and 2ms! shards=s1:8983/solr,s2:8983/solr My initial thoughts is that I MUST be doing something wrong here? So I try the following: Run the query on server s1, with the shards param shards=s1:8983/solr response time goes from sub 10ms to between 5000ms and 1ms! Same results if i run the query on s2, and same if i use shards=s2:8983/solr Is there really that much overhead in running a distributed facet field query with Solr? Anyone else experienced this? On the other hand, running regular queries without facet distributed is lightning fast... (so can't really see that this is a network problem or anything either). - I tried running a facet query on s1 with s1 as the shards param, and that is still as slow as if the shards param was pointed to a different server... Any insight into this would be greatly appreciated! (Would like to avoid having to hack together our own solution concatenating results...) Cheers, Aleks
Re: Optimize not having any effect on my index
Hey, I managed to run it correctly after a few restarts. Don't really know what happened. Can't really see what this would have had to do with compound file format tho? But no, I'm not using compund file format. Cheers and thanks for your replies, Aleks On Mon, Dec 21, 2009 at 8:27 AM, gurudev suyalprav...@yahoo.com wrote: Hi, Are you using the compound file format? If yes, then, have u set it properly in solrconfig.xml, if not, then, change to: useCompoundFiletrue/useCompoundFile (this is by default 'false') under the tags: indexDefaults.../indexDefaults and, mainIndex.../mainIndex Aleksander Stensby wrote: Hey guys, I'm getting some strange behavior here, and I'm wondering if I'm doing anything wrong.. I've got an unoptimized index, and I'm trying to run the following command: http://server:8983/solr/update?optimize=truemaxSegments=10waitFlush=false Tried it first directly in the browser, it obviously took quite a bit of time, but once it was finished I see no difference in my index. Same number of files, same size etc. So i tried with curl: curl http://server:8983/solr/update --data-binary 'optimize/' -H 'Content-type:text/xml; charset=utf-8' No difference here either... Am I doing anything wrong? Do i need to issue a commit after the optimize? Any pointers would be greatly appreciated. Cheers, Aleks -- View this message in context: http://old.nabble.com/Optimize-not-having-any-effect-on-my-index-tp26843094p26870653.html Sent from the Solr - User mailing list archive at Nabble.com.
Facets and distributed search
Hi everyone! I've posted a similar question earlier, but in a thread related to facets in general, so I thought I'd repost it here as a separate thread. I have a faceted search that is very fast when I executed the query on a single solr server, but is significantly slower when executed in a distributed environment. The set-back seem to be in the sharding of our data.. And that puzzles me a little bit... I can't really see why SOLR is so slow at doing this. The scenario: Let's say we have two servers (s1 and s2). If i query the following: q=threadid:33facet=truefacet.field=authorlimit=-1facet.mincount=0rows=0 directly on either server, the response is lightning fast. (10ms) So, in theory I could query them directly, concat the result myself and get that done pretty fast. But if I introduce the shards parameter, the response time booms to between 15000ms and 2ms! shards=s1:8983/solr,s2:8983/solr My initial thoughts is that I MUST be doing something wrong here? So I try the following: Run the query on server s1, with the shards param shards=s1:8983/solr response time goes from sub 10ms to between 5000ms and 1ms! Same results if i run the query on s2, and same if i use shards=s2:8983/solr Is there really that much overhead in running a distributed facet field query with Solr? Anyone else experienced this? On the other hand, running regular queries without facet distributed is lightning fast... (so can't really see that this is a network problem or anything either). - I tried running a facet query on s1 with s1 as the shards param, and that is still as slow as if the shards param was pointed to a different server... Any insight into this would be greatly appreciated! (Would like to avoid having to hack together our own solution concatenating results...) Cheers, Aleks
Optimize not having any effect on my index
Hey guys, I'm getting some strange behavior here, and I'm wondering if I'm doing anything wrong.. I've got an unoptimized index, and I'm trying to run the following command: http://server:8983/solr/update?optimize=truemaxSegments=10waitFlush=false Tried it first directly in the browser, it obviously took quite a bit of time, but once it was finished I see no difference in my index. Same number of files, same size etc. So i tried with curl: curl http://server:8983/solr/update --data-binary 'optimize/' -H 'Content-type:text/xml; charset=utf-8' No difference here either... Am I doing anything wrong? Do i need to issue a commit after the optimize? Any pointers would be greatly appreciated. Cheers, Aleks
Re: Can solr do the equivalent of select distinct(field)?
A follow up question on this Hoss: If I have a set of documents, let's say this email thread. Each email has a unique author. All emails in the thread are indexed with threadid=33 If I want to count the number of unique authors in this email thread, I could go along the lines you mention at the end: rows=0threadid=33facet=truefacet.field=authorlimit=-1 then count all returned facets. This works, but becomes unfeasable when the number of unique author values in the index is large. Right? So the limit=-1 solution is just not working for such fields. But would work well for category if the number of unique categories is low... It's almost faster to retrieve all entries from the thread and count programatically the number of unique authors... But obviouslly, I don't want to do that! So, how would you go about to find the number of unique authors in this scenario? Cheers, Aleks On Wed, Sep 2, 2009 at 12:57 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : lets say you filter your query on something and want to know how many : distinct categories that your results comprise. : then you can facet on the category field and count the number of facet : values that are returned, right? if you count the number of facet values returned you are getting a count of disctinct values if you just want the list of distinct values in a field (for your whole index) there TermsComponent is the fastest way. if you want the list of distinct values across a set of documents, then facet on that field when doing your query. select distinct category from books where bookInStock='true' is analgous to looking at the facet section of... rows=0q=bookInStock:truefacet=truefacet.field=category -Hoss
Re: Can solr do the equivalent of select distinct(field)?
Forgot to add facet.mincount=1, obviously. But still, is this the only or prefered way of doing something along these lines? Or is there a different (better) approach? Best regards, Aleksander On Thu, Dec 17, 2009 at 5:59 PM, Aleksander Stensby aleksander.sten...@integrasco.com wrote: A follow up question on this Hoss: If I have a set of documents, let's say this email thread. Each email has a unique author. All emails in the thread are indexed with threadid=33 If I want to count the number of unique authors in this email thread, I could go along the lines you mention at the end: rows=0threadid=33facet=truefacet.field=authorlimit=-1 then count all returned facets. This works, but becomes unfeasable when the number of unique author values in the index is large. Right? So the limit=-1 solution is just not working for such fields. But would work well for category if the number of unique categories is low... It's almost faster to retrieve all entries from the thread and count programatically the number of unique authors... But obviouslly, I don't want to do that! So, how would you go about to find the number of unique authors in this scenario? Cheers, Aleks On Wed, Sep 2, 2009 at 12:57 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : lets say you filter your query on something and want to know how many : distinct categories that your results comprise. : then you can facet on the category field and count the number of facet : values that are returned, right? if you count the number of facet values returned you are getting a count of disctinct values if you just want the list of distinct values in a field (for your whole index) there TermsComponent is the fastest way. if you want the list of distinct values across a set of documents, then facet on that field when doing your query. select distinct category from books where bookInStock='true' is analgous to looking at the facet section of... rows=0q=bookInStock:truefacet=truefacet.field=category -Hoss
Re: Can solr do the equivalent of select distinct(field)?
Thanks for your reply Erik! The speed of my suggested query is actually very fast once we add the facet.mincount=1 (when searching within a limited set of documents). The set-back seem to be in the sharding of our data.. And that puzzles me a little bit... I can't really see why SOLR is so slow at doing this. The scenario: Let's say we have two servers (s1 and s2). If i query the following: q=threadid:33facet=truefacet.field=authorlimit=-1facet.mincount=0rows=0 directly on either server, the response is lightning fast. (10ms) So, in theory I could query them directly, concat the result myself and get that done pretty fast. But if I introduce the shards parameter, the response time booms to between 15000ms and 2ms! shards=s1:8983/solr,s2:8983/solr My initial thoughts is that I MUST be doing something wrong here? So I try the following: Run the query on server s1, with the shards param shards=s1:8983/solr response time goes from sub 10ms to between 5000ms and 1ms! Same results if i run the query on s2, and same if i use shards=s2:8983/solr Is there really that much overhead in running a distributed facet field query with Solr? Anyone else experienced this? On the other hand, running regular queries without facet distributed is lightning fast... (so can't really see that this is a network problem or anything either). - and I can't possibly be as I tried running a facet query on s1 with s1 as the shards param, and that is still as slow as if the shards param was pointed to a different server... Any insight into this would be greatly appreciated! (Would like to avoid having to hack together our own solution concatinating results...) Cheers, Aleks On Thu, Dec 17, 2009 at 7:36 PM, Erik Hatcher erik.hatc...@gmail.comwrote: On Dec 17, 2009, at 11:59 AM, Aleksander Stensby wrote: A follow up question on this Hoss: If I have a set of documents, let's say this email thread. Each email has a unique author. All emails in the thread are indexed with threadid=33 If I want to count the number of unique authors in this email thread, I could go along the lines you mention at the end: rows=0threadid=33facet=truefacet.field=authorlimit=-1 then count all returned facets. This works, but becomes unfeasable when the number of unique author values in the index is large. Right? So the limit=-1 solution is just not working for such fields. But would work well for category if the number of unique categories is low... It's almost faster to retrieve all entries from the thread and count programatically the number of unique authors... But obviouslly, I don't want to do that! So, how would you go about to find the number of unique authors in this scenario? One possible solution is tree faceting: https://issues.apache.org/jira/browse/SOLR-792 facet.tree=threadid,author Could be a LARGE amount of data though! Erik
Sorting on primitive types
Hey, I have a question regarding the primitive type definitions and use of those for sorting. I have an ID field in my index of type SortableLongField, and on my test index I have about 2 million documents. When doing a sort=id desc and q=*:* I'm getting out of memory (heap space)... running the instance with 2GB of memory so I wouldn't really think that there should be any big problems here. So I'm wondering if the Trie based field types are less memory expensive than the old SortableXXFields? sorting on the date field (which is a TrieDateField) works fine (and fast)... Any input is highly appreciated! Cheers, Aleksander -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Sorting on primitive types
Perfect, thanks a heap Yonik! Cheers, Aleks On Mon, Sep 21, 2009 at 3:47 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Sep 21, 2009 at 3:30 AM, Aleksander Stensby aleksander.sten...@integrasco.com wrote: So I'm wondering if the Trie based field types are less memory expensive than the old SortableXXFields? sorting on the date field (which is a TrieDateField) works fine (and fast)... In general, yes (assuming there are many unique values - your ID field would qualify). SortableXXFields used the StringIndex (the only option in the past)... Trie* fields FieldCache entry use long[maxDoc] for TrieLong and TrieDate. -Yonik http://www.lucidimagination.com -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S E-mail: aleksander.sten...@integrasco.com Tel.: +47 41 22 82 72 www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Trie Date question
Thanks for the reply Yonik! I'm using the nightly from 2009-08-20, so its a rather fresh build. And by comparing the schema with the one im using now I had made a mistake when defining the field. By examining the most recent build, i noticed that the normal date field is defined as follows: fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ (its actually a TrieDateField? does this mean that we are moving away from the standard SolrDateField ?) and that the tdate is specified as follows: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ I'll update my schema definitions and reindex:) Guess that pretty much will solve my problems. Thanks! Aleks On Thu, Aug 27, 2009 at 3:47 PM, Yonik Seeley yo...@lucidimagination.comwrote: I can't reproduce any problem. Are you using a recent nightly build? See the example schema of a recent nightly build for the correct way to define a Trie based field - the article / blog may be out of date. Here's what I used to test the example data: http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY] -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 3:49 AM, Aleksander Stensbyaleksander.sten...@integrasco.com wrote: Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Trie Date question
Hmm, seems I was one day too early with my nightly then:p Quote from Chris (2009-08-20 17:04): i changed it to be manufacturedate_dt since that fits with the existing scheme ... the data is all made up, but so is all hte rest of our data. seems like lucene.apache.org is down at the moment but will try out the new example data once its back up again then, because even though I changed my schema definitions, the two fields still gives back different results... :( I'll keep you updated. - Aleks On Fri, Aug 28, 2009 at 9:33 AM, Aleksander Stensby aleksander.sten...@integrasco.com wrote: Thanks for the reply Yonik! I'm using the nightly from 2009-08-20, so its a rather fresh build. And by comparing the schema with the one im using now I had made a mistake when defining the field. By examining the most recent build, i noticed that the normal date field is defined as follows: fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ (its actually a TrieDateField? does this mean that we are moving away from the standard SolrDateField ?) and that the tdate is specified as follows: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ I'll update my schema definitions and reindex:) Guess that pretty much will solve my problems. Thanks! Aleks On Thu, Aug 27, 2009 at 3:47 PM, Yonik Seeley yo...@lucidimagination.comwrote: I can't reproduce any problem. Are you using a recent nightly build? See the example schema of a recent nightly build for the correct way to define a Trie based field - the article / blog may be out of date. Here's what I used to test the example data: http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY] -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 3:49 AM, Aleksander Stensbyaleksander.sten...@integrasco.com wrote: Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S E-mail: aleksander.sten...@integrasco.com Tel.: +47 41 22 82 72 www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Can solr do the equivalent of select distinct(field)?
but you could use facets to do something similar as a distinct where... lets say you filter your query on something and want to know how many distinct categories that your results comprise. then you can facet on the category field and count the number of facet values that are returned, right? but maybe that's not what you are after... cheers, Aleks On Fri, Aug 28, 2009 at 11:22 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Aug 28, 2009 at 5:05 AM, Paul Tomblin ptomb...@xcski.com wrote: Can I get all the distinct values from the Solr database, or do I have to select everything and aggregate it myself? No, Solr has no way to do a distinct at query-time. -- Regards, Shalin Shekhar Mangar. -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Trie Date question
Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail