Shard tolerant partial results
Hi, When doing distributed searches with shards.tolerant set whilst the hosts for a slice are down and therefore the response is partial, how best that inferred as we would like to not cache the results upstream and perhaps inform the end user in some way. I am aware that shards.info could be used, however I am concerned this may have performance implications due to cost parsing the response from solr and perhaps some extra cost incurred by solr to generate the response. Perhaps an http header could be added or another attribute added to the solr result node. Phil __ "brightsolid" is used in this email to collectively mean brightsolid online innovation limited and its subsidiary companies brightsolid online publishing limited and brightsolid online technology limited. findmypast.co.uk is a brand of brightsolid online publishing limited. brightsolid online innovation limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington Street, London EC2A 3DQ. Registered in England No. 04369607. brightsolid online technology limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. Email Disclaimer This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it. __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: Solr load balancer
Hi, I have opened a couple of jira's, one to make the HttpShardHandlerFactory and LBHttpSolrServer more easily extended: https://issues.apache.org/jira/browse/SOLR-4448 and one with an implementation of a backup requesting load balancer : https://issues.apache.org/jira/browse/SOLR-4449 . The implementation does not attempt to cancel inflight requests if a successful response is received, in fact it returns the successful response immediately then allows the inflight requests to complete. That way it can detect 'zombie' servers in a way similar to the current load balancer and not send them requests for a specified time. Phil -Original Message- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: 01 February 2013 01:51 To: solr-user@lucene.apache.org Subject: RE: Solr load balancer For what it's worth, Google has done some pretty interesting research into coping with the idea that particular shards might very well be busy doing something else when your query comes in. Check out this slide deck: http://research.google.com/people/jeff/latency.html Lots of interesting ideas, but in particular, around slide 39 he talks about "backup requests" where you wait for something like your typical response time and then issue a second request to a different shard. You take whichever answer you get first, and cancel the other. The initial wait + cancellation means your extra cluster load is minimal, and you still get the benefit of reducing your p95+ response times if the first request was high-latency due to something unrelated to the query. (Say, GC.) Of course, a central principle of this approach is being able to cancel a query and have it stop consuming resources. I'd love to be corrected, but I don't think Solr allows this. You can stop waiting for a response, but even the timeAllowed param doesn't seem to stop resource usage after the allotted time. Meaning, a few exceptionally long-running queries can take out your high-throughput cluster by tying up entire CPUs for long periods. Let me know the JIRA number, I'd love to see work in this area. -Original Message- From: Phil Hoy [mailto:p...@brightsolid.com] Sent: Tuesday, January 29, 2013 11:33 AM To: solr-user@lucene.apache.org Subject: RE: Solr load balancer Hi Erick, Thanks, I have read the blogs you cited and I found them very interesting, and we have tuned the jvm accordingly but still we get the odd longish gc pause. That said we perhaps have an unusual setup; we index a lot of small documents using servers with ssd's and 128 GB RAM in a sharded set up with replicas and our queries rely heavily on query filters and faceting with minimal free-text style searching. For that reason we rely heavily on the filter cache to improve query latency, therefore we assign a large percentage of available ram to the jvm hosting solr. Anyhow we are happy with the current configuration and performance profile, aside from the odd gc pause that is, and as we have index replicas it seems to me that we should be able to cope, hence my willingness to tweak how the load balancer behaves. Thanks, Phil -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 20 January 2013 15:56 To: solr-user@lucene.apache.org Subject: Re: Solr load balancer Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's a great place to start: http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/ and: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html I've wondered about a similar approach, but by firing off the same query to multiple nodes in your cluster, you'll be effectively doubling (at least) the load on your system. Leading to more memory issues perhaps in a "non-virtuous cycle". FWIW, Erick On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy wrote: > Hi, > > I would like to experiment with some custom load balancers to help with query > latency in the face of long gc pauses and the odd time-consuming query that > we need to be able to support. At the moment setting the socket timeout via > the HttpShardHandlerFactory does help, but of course it can only be set to a > length of time as long as the most time consuming query we are likely to > receive. > > For example perhaps a load balancer that sends multiple queries concurrently > to all/some replicas and only keeps the first response might be effective. Or > maybe a load balancer which takes account of the frequency of timeouts would > be able to recognize zombies more effectively. > > To use alternative load balancer implementations cleanly and without having > to hack solr directly, I would need to be able to make the existing > LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I > can then override th
RE: Solr load balancer
Hi, So am I correct in thinking that I add the jira myself, if so can I add it do the 4.2 release? Also I have further questions about the scope of my patch, should that be left to the comments of the jira itself? Phil -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: 22 January 2013 17:25 To: solr-user@lucene.apache.org Subject: Re: Solr load balancer Hi Phil, Have a look at http://wiki.apache.org/solr/HowToContribute and thank you in advance! :) Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy wrote: > Hi, > > I would like to experiment with some custom load balancers to help > with query latency in the face of long gc pauses and the odd > time-consuming query that we need to be able to support. At the moment > setting the socket timeout via the HttpShardHandlerFactory does help, > but of course it can only be set to a length of time as long as the > most time consuming query we are likely to receive. > > For example perhaps a load balancer that sends multiple queries > concurrently to all/some replicas and only keeps the first response > might be effective. Or maybe a load balancer which takes account of > the frequency of timeouts would be able to recognize zombies more effectively. > > To use alternative load balancer implementations cleanly and without > having to hack solr directly, I would need to be able to make the > existing LBHttpSolrServer and HttpShardHandlerFactory more amenable to > extension, I can then override the default load balancer using solr's plugin > mechanism. > > So my question is, if I made a patch to make the load balancer more > pluggable, is this something that would be acceptable and if so what > do I do next? > > Phil > > __ > "brightsolid" is used in this email to collectively mean brightsolid > online innovation limited and its subsidiary companies brightsolid > online publishing limited and brightsolid online technology limited. > findmypast.co.uk is a brand of brightsolid online publishing limited. > brightsolid online innovation limited, Gateway House, Luna Place, > Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. > brightsolid online publishing limited, The Glebe, 6 Chapel Place, > Rivington Street, London EC2A 3DQ. Registered in England No. 04369607. > brightsolid online technology limited, Gateway House, Luna Place, > Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. > > Email Disclaimer > > This message is confidential and may contain privileged information. > You should not disclose its contents to any other person. If you are > not the intended recipient, please notify the sender named above > immediately. It is expressly declared that this e-mail does not > constitute nor form part of a contract or unilateral obligation. > Opinions, conclusions and other information in this message that do > not relate to the official business of brightsolid shall be understood as > neither given nor endorsed by it. > __ > This email has been scanned by the brightsolid Email Security System. > Powered by MessageLabs > __ __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __ __ "brightsolid" is used in this email to collectively mean brightsolid online innovation limited and its subsidiary companies brightsolid online publishing limited and brightsolid online technology limited. findmypast.co.uk is a brand of brightsolid online publishing limited. brightsolid online innovation limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington Street, London EC2A 3DQ. Registered in England No. 04369607. brightsolid online technology limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. Email Disclaimer This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obl
RE: Solr load balancer
Hi Erick, Thanks, I have read the blogs you cited and I found them very interesting, and we have tuned the jvm accordingly but still we get the odd longish gc pause. That said we perhaps have an unusual setup; we index a lot of small documents using servers with ssd's and 128 GB RAM in a sharded set up with replicas and our queries rely heavily on query filters and faceting with minimal free-text style searching. For that reason we rely heavily on the filter cache to improve query latency, therefore we assign a large percentage of available ram to the jvm hosting solr. Anyhow we are happy with the current configuration and performance profile, aside from the odd gc pause that is, and as we have index replicas it seems to me that we should be able to cope, hence my willingness to tweak how the load balancer behaves. Thanks, Phil -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 20 January 2013 15:56 To: solr-user@lucene.apache.org Subject: Re: Solr load balancer Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's a great place to start: http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/ and: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html I've wondered about a similar approach, but by firing off the same query to multiple nodes in your cluster, you'll be effectively doubling (at least) the load on your system. Leading to more memory issues perhaps in a "non-virtuous cycle". FWIW, Erick On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy wrote: > Hi, > > I would like to experiment with some custom load balancers to help with query > latency in the face of long gc pauses and the odd time-consuming query that > we need to be able to support. At the moment setting the socket timeout via > the HttpShardHandlerFactory does help, but of course it can only be set to a > length of time as long as the most time consuming query we are likely to > receive. > > For example perhaps a load balancer that sends multiple queries concurrently > to all/some replicas and only keeps the first response might be effective. Or > maybe a load balancer which takes account of the frequency of timeouts would > be able to recognize zombies more effectively. > > To use alternative load balancer implementations cleanly and without having > to hack solr directly, I would need to be able to make the existing > LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I > can then override the default load balancer using solr's plugin mechanism. > > So my question is, if I made a patch to make the load balancer more > pluggable, is this something that would be acceptable and if so what do I do > next? > > Phil > > __ > "brightsolid" is used in this email to collectively mean brightsolid online > innovation limited and its subsidiary companies brightsolid online publishing > limited and brightsolid online technology limited. > findmypast.co.uk is a brand of brightsolid online publishing limited. > brightsolid online innovation limited, Gateway House, Luna Place, Dundee > Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. > brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington > Street, London EC2A 3DQ. Registered in England No. 04369607. > brightsolid online technology limited, Gateway House, Luna Place, Dundee > Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. > > Email Disclaimer > > This message is confidential and may contain privileged information. You > should not disclose its contents to any other person. If you are not the > intended recipient, please notify the sender named above immediately. It is > expressly declared that this e-mail does not constitute nor form part of a > contract or unilateral obligation. Opinions, conclusions and other > information in this message that do not relate to the official business of > brightsolid shall be understood as neither given nor endorsed by it. > __ > This email has been scanned by the brightsolid Email Security System. > Powered by MessageLabs > __ __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __ __ "brightsolid" is used in this email to collectively mean brightsolid online innovati
Solr load balancer
Hi, I would like to experiment with some custom load balancers to help with query latency in the face of long gc pauses and the odd time-consuming query that we need to be able to support. At the moment setting the socket timeout via the HttpShardHandlerFactory does help, but of course it can only be set to a length of time as long as the most time consuming query we are likely to receive. For example perhaps a load balancer that sends multiple queries concurrently to all/some replicas and only keeps the first response might be effective. Or maybe a load balancer which takes account of the frequency of timeouts would be able to recognize zombies more effectively. To use alternative load balancer implementations cleanly and without having to hack solr directly, I would need to be able to make the existing LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I can then override the default load balancer using solr's plugin mechanism. So my question is, if I made a patch to make the load balancer more pluggable, is this something that would be acceptable and if so what do I do next? Phil __ "brightsolid" is used in this email to collectively mean brightsolid online innovation limited and its subsidiary companies brightsolid online publishing limited and brightsolid online technology limited. findmypast.co.uk is a brand of brightsolid online publishing limited. brightsolid online innovation limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington Street, London EC2A 3DQ. Registered in England No. 04369607. brightsolid online technology limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. Email Disclaimer This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it. __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: multi-core sharing synonym map
Yes I was thinking the same thing, although I was hoping there was a more elegant mechanism exposed by the solr infrastructure code to handle the shared map, aside from just using a global that is. Phil -Original Message- From: simon [mailto:mtnes...@gmail.com] Sent: 12 October 2012 19:38 To: solr-user@lucene.apache.org Subject: Re: multi-core sharing synonym map I definitely haven't tried this ;=) but perhaps you could create your own XXXSynonymFilterFactory as a subclass of SynonymFilterFactory, which would allow you to share the synonym map across all cores - though I think there would need to be a nasty global variable to hold a reference to it... -Simon On Fri, Oct 12, 2012 at 12:27 PM, Phil Hoy wrote: > Hi, > > We have a multi-core set up with a fairly large synonym file, all > cores share the same schema.xml and synonym file but when solr loads > the cores, it loads multiple instances of the synonym map, this is a > little wasteful of memory and lengthens the start-up time. Is there a > way to get all cores to share the same map? > > > Phil > __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: Unique terms without faceting
Hi, I don't think you can use that component whilst taking into account any fq or q parameters. Phil -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: 10 October 2012 16:51 To: solr-user@lucene.apache.org Subject: Re: Unique terms without faceting The Solr TermsComponent: http://wiki.apache.org/solr/TermsComponent -- Jack Krupansky -Original Message- From: Phil Hoy Sent: Wednesday, October 10, 2012 11:45 AM To: solr-user@lucene.apache.org Subject: Unique terms without faceting Hi, I know that you can use a facet query to get the unique terms for a field taking account of any q or fq parameters but for our use case the counts are not needed. So is there a more efficient way of finding just unique terms for a field? Phil __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
Unique terms without faceting
Hi, I know that you can use a facet query to get the unique terms for a field taking account of any q or fq parameters but for our use case the counts are not needed. So is there a more efficient way of finding just unique terms for a field? Phil
RE: trunk cloud ui not working
Hi, I was using windows 7 but it is fine with chrome on Windows Web Server 2008 R2 also I asked a colleague with windows 7 and it is fine for him too, so really sorry but I think it was a !'works on my machine' thing. Of course if I track down the cause I will reply to this email again. Thanks, Phil -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 21 May 2012 18:22 To: solr-user@lucene.apache.org Subject: Re: trunk cloud ui not working What OS? I was just trying trunk and looking at that view on Chrome on OSX and Linux and did not see an issue. On May 21, 2012, at 1:15 PM, Phil Hoy wrote: > After further investigation I have found that it is not a problem on firefox, > only chrome and IE. > > Phil > > -Original Message- > Sent: 21 May 2012 18:05 > To: solr-user@lucene.apache.org > Subject: trunk cloud ui not working > > Hi, > > I am running from the trunk and the localhost:8983/solr/#/~cloud page shows > nothing but "Fetch Zookeeper Data". > > If I run fiddler I see that: > http://localhost:8983/solr/zookeeper?wt=json&detail=true&path=%2Fclust > erstate.json > and > http://localhost:8983/solr/zookeeper?wt=json&path=%2Flive_nodes > are called and return data but no update to the ui. > > Cheers, > Phil > > > __ > This email has been scanned by the brightsolid Email Security System. > Powered by MessageLabs > __ - Mark Miller lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: trunk cloud ui not working
After further investigation I have found that it is not a problem on firefox, only chrome and IE. Phil -Original Message- Sent: 21 May 2012 18:05 To: solr-user@lucene.apache.org Subject: trunk cloud ui not working Hi, I am running from the trunk and the localhost:8983/solr/#/~cloud page shows nothing but "Fetch Zookeeper Data". If I run fiddler I see that: http://localhost:8983/solr/zookeeper?wt=json&detail=true&path=%2Fclusterstate.json and http://localhost:8983/solr/zookeeper?wt=json&path=%2Flive_nodes are called and return data but no update to the ui. Cheers, Phil __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: Custom Sharding on solrcloud
Hi, If I remove the DistributedUpdateProcessorFactory I will have to manage a master slave setup myself by updating solely to the master and replicating to any slave. I wonder is it possible to have distributed updates but confined to the sub-set of cores and replicas within a collection that share the same name? Phil -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 08 March 2012 01:02 To: solr-user@lucene.apache.org Subject: Re: Custom Sharding on solrcloud Hi Phil - The default update chain now includes the distributed update processor by default - and if in solrcloud mode it will be active. Probably, what you want to do is define your own update chain (see the wiki). Then you can add that update chain as the default for your json update handler in solrconfig.xml. mychain The default chain is: new LogUpdateProcessorFactory(), new DistributedUpdateProcessorFactory(), new RunUpdateProcessorFactory() So just use Log and Run instead to get your old behavior. - Mark On Mar 7, 2012, at 1:37 PM, Phil Hoy wrote: > Hi, > > We have a large index and would like to shard by a particular field value, in > our case surname. This way we can scale out to multiple machines, yet as most > queries filter on surname we can use some application logic to hit just the > one core to get the results we need. > > Furthermore as we anticipate the index will grow over time so it make sense > (to us) to host a number of shards on a single machine until they get too big > at which point we can then move them to another machine. > > We are using solrcloud and it is set up using a solrcore per shard, that way > we can direct both queries and updates to the appropriate core/shard. To do > this our solr.xml looks a bit like this: > > zkClientTimeout="1" hostPort="8983" > name="aaa-ava" instanceDir="/data/recordsets/shards/aaa-ava" > collection="recordsets" /> >instanceDir="/data/recordsets/shards/aaa-ava" collection="recordsets" /> >instanceDir="/data/recordsets/shards/avb-bel" collection="recordsets" /> > ... > > Directed updates via: > http:/server/solr/aaa-ava/update/json [{surname:"adams"}] > > Directed queries via: > http:/server/solr/select?surname:adams&shards=aaa-ava > > This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13 > before the more recent solrcloud changes but now the update is not directed > to the appropriate core. Is there a better way to achieve our needs? > > Phil > - Mark Miller lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
Custom Sharding on solrcloud
Hi, We have a large index and would like to shard by a particular field value, in our case surname. This way we can scale out to multiple machines, yet as most queries filter on surname we can use some application logic to hit just the one core to get the results we need. Furthermore as we anticipate the index will grow over time so it make sense (to us) to host a number of shards on a single machine until they get too big at which point we can then move them to another machine. We are using solrcloud and it is set up using a solrcore per shard, that way we can direct both queries and updates to the appropriate core/shard. To do this our solr.xml looks a bit like this: ... Directed updates via: http:/server/solr/aaa-ava/update/json [{surname:"adams"}] Directed queries via: http:/server/solr/select?surname:adams&shards=aaa-ava This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13 before the more recent solrcloud changes but now the update is not directed to the appropriate core. Is there a better way to achieve our needs? Phil
RE: removing cores solrcloud
Hi, I have tried removing the entry from zookeeper as well as from solr via admin/cores?action=uload and still the distributed query hits the missing core. I guess there is no zookeeper watcher in solr to update the core/shard state used by search. I got round the problem by doing the above then running a admin/cores?action=reload on any core in the collection, this seems to force the solr's distributed searcher to re-consult zookeeper. Phil -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 31 January 2012 18:16 To: solr-user@lucene.apache.org Subject: Re: removing cores solrcloud On Jan 31, 2012, at 1:03 PM, Phil Hoy wrote: > Hi Mark, > > I am using the embedded zookeeper server, how would you recommend I connect > to it so that I can remove the missing core or is it only possible when using > a stand-alone zookeeper instance? Nope, both cases are the same - you just need a ZK tool and the ZK address to connect that tool to ZK. ZK itself comes with some command line scripts that you could use - their are also a couple GUI tools out there. If you use eclipse, my favorite way to interact with ZK is http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper I think (hard to remember what came in when) you just have to remove the node from /node_states and the overseer will update the cluster state. Sami Siren might be able to comment more on that. I am looking into doing this automatically when you unload a SolrCore - https://issues.apache.org/jira/browse/SOLR-3080 > > You are of course correct the reload command as well a few others should > cause a resync with the zookeepers state too. > > I am currently using version 4.0.0.2011.12.12.09.26.56. > > Phil > > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: 31 January 2012 16:09 > To: solr-user@lucene.apache.org > Subject: Re: removing cores solrcloud > > > On Jan 31, 2012, at 4:49 AM, Phil Hoy wrote: > >> Hi, >> >> I am running solrcloud and i am able to add cores >> http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how >> does one remove cores. If i use the core admin unload command, distributed >> queries then error as they still query the removed core. Do I need to update >> zookeeper somehow? >> >> Phil > > > Hey Phil - yeah, currently you would have to manually remove the core from > zookeeper. Once we see it, we expect it to be part of the index - perhaps we > should remove it on an explicit core reload though? > > What version of trunk are you using? > > - Mark Miller > lucidimagination.com > > > > > > > > > > > > > __ > This email has been scanned by the brightsolid Email Security System. Powered > by MessageLabs > __ - Mark Miller lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: removing cores solrcloud
Hi Mark, I am using the embedded zookeeper server, how would you recommend I connect to it so that I can remove the missing core or is it only possible when using a stand-alone zookeeper instance? You are of course correct the reload command as well a few others should cause a resync with the zookeepers state too. I am currently using version 4.0.0.2011.12.12.09.26.56. Phil -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 31 January 2012 16:09 To: solr-user@lucene.apache.org Subject: Re: removing cores solrcloud On Jan 31, 2012, at 4:49 AM, Phil Hoy wrote: > Hi, > > I am running solrcloud and i am able to add cores > http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how > does one remove cores. If i use the core admin unload command, distributed > queries then error as they still query the removed core. Do I need to update > zookeeper somehow? > > Phil Hey Phil - yeah, currently you would have to manually remove the core from zookeeper. Once we see it, we expect it to be part of the index - perhaps we should remove it on an explicit core reload though? What version of trunk are you using? - Mark Miller lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
removing cores solrcloud
Hi, I am running solrcloud and i am able to add cores http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how does one remove cores. If i use the core admin unload command, distributed queries then error as they still query the removed core. Do I need to update zookeeper somehow? Phil
solrcloud replicating new cores
Hi, Is it possible to configure solr using solrcloud and the distribution handler such that if a new core is added to the master then that core is added and replicated to the slaves. Phil
RE: DirectSolrSpellChecker on request specified field.
Added issue: https://issues.apache.org/jira/browse/SOLR-2926 Please let me know if more information needs adding to JIRA. Phil -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: 28 November 2011 19:32 To: solr-user@lucene.apache.org Subject: Re: DirectSolrSpellChecker on request specified field. technically it could? I'm just not sure if the current spellchecking apis allow for it? But maybe someone has a good idea on how to easily expose this. I think its a good idea. Care to open a JIRA issue? On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy wrote: > Hi, > > Can the DirectSolrSpellChecker be used for autosuggest but defer to request > time the name of the field to use to create the dictionary. That way I don't > have to define spellcheckers specific to each field which for me is not > really possible as the fields I wish to spell check are DynamicFields. > > I could copy all dynamic fields into a 'spellcheck' field but then I could > get false suggestions if I use it to get suggestions for a particular dynamic > field where a term returned derives from a different field. > > Phil > > > -- lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
DirectSolrSpellChecker on request specified field.
Hi, Can the DirectSolrSpellChecker be used for autosuggest but defer to request time the name of the field to use to create the dictionary. That way I don't have to define spellcheckers specific to each field which for me is not really possible as the fields I wish to spell check are DynamicFields. I could copy all dynamic fields into a 'spellcheck' field but then I could get false suggestions if I use it to get suggestions for a particular dynamic field where a term returned derives from a different field. Phil
RE: Sort question
You might be able to sort by the map function q=*:*&sort=map(price,0,100, 10) asc, price asc. Phil -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 25 November 2011 13:49 To: solr-user@lucene.apache.org Subject: Re: Sort question Not that I know of. You could conceivably do some work at index time to create a field that would sort in that order by doing some sort of mapping from these values into a field that sorts the way you want, or you might be able to do a plugin Best Erick On Wed, Nov 23, 2011 at 3:29 AM, vraa wrote: > Hi > > I have a query where i sort by a column "price". This field can contain the > following values > > 10 > 75000 > 15 > 1 > 225000 > 50 > 40 > > I want to sort these values so that always between 0 and 100 always comes > last. > > Eg sorting by price asc should look like this: > 75000 > 10 > 15 > 225000 > 1 > 40 > 50 > > Is this possible? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Sort-question-tp3530070p3530070.html > Sent from the Solr - User mailing list archive at Nabble.com. > __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: Query a field with no value or a particular value.
Hi, Thanks for getting back to me, and sorry the default q value was *:* so I omitted it from the example. I do not have a problem getting the null values so q=*:*&fq=-field:[* TO *] indeed works but I also need the docs with a specific value e.g. fq=field:yes. Is this possible? Phil -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 25 November 2011 13:59 To: solr-user@lucene.apache.org Subject: Re: Query a field with no value or a particular value. You haven't specified any "q" clause, just an "fq" clause. Try q=*:* -field:[* TO *] or q=*:*&fq=-field:[* TO *] BTW, the logic of field:yes -field:[* TO *] makes no sense You're saying "find me all the fields containing the value "yes" and remove from that set all the fields containing any value at all".... Best Erick On Fri, Nov 25, 2011 at 7:28 AM, Phil Hoy wrote: > Hi, > > Is it possible to constrain the results of a query to return docs were a > field contains no value or a particular value? > > I tried ?fq=(field:yes OR -field:[* TO *]) but I get no results even though > queries with either ?fq=field:yes or ?fq=-field:[* TO *]) do return results. > > > Phil > __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
Query a field with no value or a particular value.
Hi, Is it possible to constrain the results of a query to return docs were a field contains no value or a particular value? I tried ?fq=(field:yes OR -field:[* TO *]) but I get no results even though queries with either ?fq=field:yes or ?fq=-field:[* TO *]) do return results. Phil
NullPointerException with distributed facets
Hi, When doing a distributed query in solr 4.0 (4.0.0.2011.06.25.15.36.22) with facet.missing=true and facet.limit=20 I get a NullPointerException. By increasing the facet limit to 200 or setting facet missing to false it seems to fix it. The shards both contain the field but one shard always has a value and one never has a value. Single shard queries work fine on each shard. Does anyone know the cause or a fix? java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:489) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:278) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1452) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Phil
RE: SolrCloud with large synonym files
I tried adding the property but it did not seem to improve things. I did however get it working by noticing that the ZkSolrResourceLoader has a fall back to load resources from the shared lib, this worked for me. Thanks for getting back to me. Phil -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 02 November 2011 15:06 To: solr-user@lucene.apache.org Subject: Re: SolrCloud with large synonym files On Nov 2, 2011, at 7:47 AM, Phil Hoy wrote: > Hi, > > I am running solrcloud and a file in the Dbootstrap_confdir is a large large > synonym file (~50mb ) used by a SynonymFilterFactory configured in the > schema.xml. When i start solr I get a zookeeper exception presumably because > the file size is too large. > > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv > at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) > > Is there a way to either increase the limit in zookeeper or perhaps configure > the SynonymFilterFactory differently to get the file from somewhere external > to Dbootstrap_confdir? > > Phil As a workaround you can try: (Java system property:* jute.maxbuffer*) This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xf, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size. Eventually there are other ways to solve this that we may offer... Optional compression of files Store a file across multiple zk nodes transparently when size is too large - Mark Miller lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
RE: SolrCloud with large synonym files
It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers to ZkSolrResourceLoader to load the synonym file when in cloud mode. Phil -Original Message- From: ☼ 林永忠 ☼ (Yung-chung Lin) [mailto:henearkrx...@gmail.com] Sent: 02 November 2011 12:24 To: solr-user@lucene.apache.org Subject: Re: SolrCloud with large synonym files Hi, I didn't use Solr with Zookeeper before. But Solr 3.4 implements the synonym module with a different data structure. If the version of your Solr is not 3.4, then maybe you can try upgrading it first. See also this thread on stackoverflow. http://stackoverflow.com/questions/6747664/solr-and-big-synonym-file Yung-chung Lin 2011/11/2 Phil Hoy > Hi, > > I am running solrcloud and a file in the Dbootstrap_confdir is a large > large synonym file (~50mb ) used by a SynonymFilterFactory configured in > the schema.xml. When i start solr I get a zookeeper exception presumably > because the file size is too large. > > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv >at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) > > Is there a way to either increase the limit in zookeeper or perhaps > configure the SynonymFilterFactory differently to get the file from > somewhere external to Dbootstrap_confdir? > > Phil > __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
SolrCloud with large synonym files
Hi, I am running solrcloud and a file in the Dbootstrap_confdir is a large large synonym file (~50mb ) used by a SynonymFilterFactory configured in the schema.xml. When i start solr I get a zookeeper exception presumably because the file size is too large. Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) Is there a way to either increase the limit in zookeeper or perhaps configure the SynonymFilterFactory differently to get the file from somewhere external to Dbootstrap_confdir? Phil