Shard tolerant partial results

2013-07-01 Thread Phil Hoy
Hi,

When doing distributed searches with shards.tolerant set whilst the hosts for a 
slice are down and therefore the response is partial, how best that inferred as 
we would like to not cache the results upstream and perhaps inform the end user 
in some way.

I am aware that shards.info could be used, however I am concerned this may have 
performance implications due to cost parsing the response from solr and perhaps 
some extra cost incurred by solr to generate the response.

Perhaps an http header could be added or another attribute added to the solr 
result node.

Phil

__
"brightsolid" is used in this email to collectively mean brightsolid online 
innovation limited and its subsidiary companies brightsolid online publishing 
limited and brightsolid online technology limited.
findmypast.co.uk is a brand of brightsolid online publishing limited.
brightsolid online innovation limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC274983.
brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington 
Street, London EC2A 3DQ. Registered in England No. 04369607.
brightsolid online technology limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC161678.

Email Disclaimer

This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of brightsolid shall be 
understood as neither given nor endorsed by it.
__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__

RE: Solr load balancer

2013-02-13 Thread Phil Hoy
Hi,

I have opened a couple of jira's, one to make the HttpShardHandlerFactory and 
LBHttpSolrServer more easily extended: 
https://issues.apache.org/jira/browse/SOLR-4448 and one with an implementation 
of a backup requesting load balancer : 
https://issues.apache.org/jira/browse/SOLR-4449 .

The implementation does not attempt to cancel inflight requests if a successful 
response is received, in fact it returns the successful response immediately 
then allows the inflight requests to complete. That way it can detect 'zombie' 
servers in a way similar to the current load balancer and not send them 
requests for a specified time.

Phil

-Original Message-
From: Jeff Wartes [mailto:jwar...@whitepages.com] 
Sent: 01 February 2013 01:51
To: solr-user@lucene.apache.org
Subject: RE: Solr load balancer


For what it's worth, Google has done some pretty interesting research into 
coping with the idea that particular shards might very well be busy doing 
something else when your query comes in.

Check out this slide deck: http://research.google.com/people/jeff/latency.html
Lots of interesting ideas, but in particular, around slide 39 he talks about 
"backup requests" where you wait for something like your typical response time 
and then issue a second request to a different shard. You take whichever answer 
you get first, and cancel the other. The initial wait + cancellation means your 
extra cluster load is minimal, and you still get the benefit of reducing your 
p95+ response times if the first request was high-latency due to something 
unrelated to the query. (Say, GC.)

Of course, a central principle of this approach is being able to cancel a query 
and have it stop consuming resources. I'd love to be corrected, but I don't 
think Solr allows this. You can stop waiting for a response, but even the 
timeAllowed param doesn't seem to stop resource usage after the allotted time.  
Meaning, a few exceptionally long-running queries can take out your 
high-throughput cluster by tying up entire CPUs for long periods.

Let me know the JIRA number, I'd love to see work in this area.


-Original Message-
From: Phil Hoy [mailto:p...@brightsolid.com]
Sent: Tuesday, January 29, 2013 11:33 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr load balancer

Hi Erick,

Thanks, I have read the blogs you cited and I found them very interesting, and 
we have tuned the jvm accordingly but still we get the odd longish gc pause. 

That said we perhaps have an unusual setup; we index a lot of small documents 
using servers with ssd's and 128 GB RAM in a sharded set up with replicas and 
our queries rely heavily on query filters and faceting with minimal free-text 
style searching. For that reason we rely heavily on the filter cache to improve 
query latency, therefore we assign a large percentage of available ram to the 
jvm hosting solr. 

Anyhow we are happy with the current configuration and performance profile, 
aside from the odd gc pause that is, and as we have index replicas it seems to 
me that we should be able to cope, hence my willingness to tweak how the load 
balancer behaves.

Thanks,
Phil



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 20 January 2013 15:56
To: solr-user@lucene.apache.org
Subject: Re: Solr load balancer

Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's 
a great place to start:

http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/
and:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I've wondered about a similar approach, but by firing off the same query to 
multiple nodes in your cluster, you'll be effectively doubling (at least) the 
load on your system. Leading to more memory issues perhaps in a "non-virtuous 
cycle".

FWIW,
Erick

On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy  wrote:
> Hi,
>
> I would like to experiment with some custom load balancers to help with query 
> latency in the face of long gc pauses and the odd time-consuming query that 
> we need to be able to support. At the moment setting the socket timeout via 
> the HttpShardHandlerFactory does help, but of course it can only be set to a 
> length of time as long as the most time consuming query we are likely to 
> receive.
>
> For example perhaps a load balancer that sends multiple queries concurrently 
> to all/some replicas and only keeps the first response might be effective. Or 
> maybe a load balancer which takes account of the frequency of timeouts would 
> be able to recognize zombies more effectively.
>
> To use alternative load balancer implementations cleanly and without having 
> to hack solr directly, I would need to be able to make the existing 
> LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I 
> can then override th

RE: Solr load balancer

2013-01-31 Thread Phil Hoy
Hi,

So am I correct in thinking that I add the jira myself, if so can I add it do 
the 4.2 release? Also I have further questions about the scope of my patch, 
should that be left to the comments of the jira itself?

Phil

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: 22 January 2013 17:25
To: solr-user@lucene.apache.org
Subject: Re: Solr load balancer

Hi Phil,

Have a look at http://wiki.apache.org/solr/HowToContribute and thank you in 
advance! :)

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy  wrote:

> Hi,
>
> I would like to experiment with some custom load balancers to help 
> with query latency in the face of long gc pauses and the odd 
> time-consuming query that we need to be able to support. At the moment 
> setting the socket timeout via the HttpShardHandlerFactory does help, 
> but of course it can only be set to a length of time as long as the 
> most time consuming query we are likely to receive.
>
> For example perhaps a load balancer that sends multiple queries 
> concurrently to all/some replicas and only keeps the first response 
> might be effective. Or maybe a load balancer which takes account of 
> the frequency of timeouts would be able to recognize zombies more effectively.
>
> To use alternative load balancer implementations cleanly and without 
> having to hack solr directly, I would need to be able to make the 
> existing LBHttpSolrServer and HttpShardHandlerFactory more amenable to 
> extension, I can then override the default load balancer using solr's plugin 
> mechanism.
>
> So my question is, if I made a patch to make the load balancer more 
> pluggable, is this something that would be acceptable and if so what 
> do I do next?
>
> Phil
>
> __
> "brightsolid" is used in this email to collectively mean brightsolid 
> online innovation limited and its subsidiary companies brightsolid 
> online publishing limited and brightsolid online technology limited.
> findmypast.co.uk is a brand of brightsolid online publishing limited.
> brightsolid online innovation limited, Gateway House, Luna Place, 
> Dundee Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC274983.
> brightsolid online publishing limited, The Glebe, 6 Chapel Place, 
> Rivington Street, London EC2A 3DQ. Registered in England No. 04369607.
> brightsolid online technology limited, Gateway House, Luna Place, 
> Dundee Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC161678.
>
> Email Disclaimer
>
> This message is confidential and may contain privileged information. 
> You should not disclose its contents to any other person. If you are 
> not the intended recipient, please notify the sender named above 
> immediately. It is expressly declared that this e-mail does not 
> constitute nor form part of a contract or unilateral obligation. 
> Opinions, conclusions and other information in this message that do 
> not relate to the official business of brightsolid shall be understood as 
> neither given nor endorsed by it.
> __
> This email has been scanned by the brightsolid Email Security System.
> Powered by MessageLabs
> __


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__

__
"brightsolid" is used in this email to collectively mean brightsolid online 
innovation limited and its subsidiary companies brightsolid online publishing 
limited and brightsolid online technology limited.
findmypast.co.uk is a brand of brightsolid online publishing limited.
brightsolid online innovation limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC274983.
brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington 
Street, London EC2A 3DQ. Registered in England No. 04369607.
brightsolid online technology limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC161678.

Email Disclaimer

This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obl

RE: Solr load balancer

2013-01-29 Thread Phil Hoy
Hi Erick,

Thanks, I have read the blogs you cited and I found them very interesting, and 
we have tuned the jvm accordingly but still we get the odd longish gc pause. 

That said we perhaps have an unusual setup; we index a lot of small documents 
using servers with ssd's and 128 GB RAM in a sharded set up with replicas and 
our queries rely heavily on query filters and faceting with minimal free-text 
style searching. For that reason we rely heavily on the filter cache to improve 
query latency, therefore we assign a large percentage of available ram to the 
jvm hosting solr. 

Anyhow we are happy with the current configuration and performance profile, 
aside from the odd gc pause that is, and as we have index replicas it seems to 
me that we should be able to cope, hence my willingness to tweak how the load 
balancer behaves.

Thanks,
Phil



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 20 January 2013 15:56
To: solr-user@lucene.apache.org
Subject: Re: Solr load balancer

Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's 
a great place to start:

http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/
and:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I've wondered about a similar approach, but by firing off the same query to 
multiple nodes in your cluster, you'll be effectively doubling (at least) the 
load on your system. Leading to more memory issues perhaps in a "non-virtuous 
cycle".

FWIW,
Erick

On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy  wrote:
> Hi,
>
> I would like to experiment with some custom load balancers to help with query 
> latency in the face of long gc pauses and the odd time-consuming query that 
> we need to be able to support. At the moment setting the socket timeout via 
> the HttpShardHandlerFactory does help, but of course it can only be set to a 
> length of time as long as the most time consuming query we are likely to 
> receive.
>
> For example perhaps a load balancer that sends multiple queries concurrently 
> to all/some replicas and only keeps the first response might be effective. Or 
> maybe a load balancer which takes account of the frequency of timeouts would 
> be able to recognize zombies more effectively.
>
> To use alternative load balancer implementations cleanly and without having 
> to hack solr directly, I would need to be able to make the existing 
> LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I 
> can then override the default load balancer using solr's plugin mechanism.
>
> So my question is, if I made a patch to make the load balancer more 
> pluggable, is this something that would be acceptable and if so what do I do 
> next?
>
> Phil
>
> __
> "brightsolid" is used in this email to collectively mean brightsolid online 
> innovation limited and its subsidiary companies brightsolid online publishing 
> limited and brightsolid online technology limited.
> findmypast.co.uk is a brand of brightsolid online publishing limited.
> brightsolid online innovation limited, Gateway House, Luna Place, Dundee 
> Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC274983.
> brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington 
> Street, London EC2A 3DQ. Registered in England No. 04369607.
> brightsolid online technology limited, Gateway House, Luna Place, Dundee 
> Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC161678.
>
> Email Disclaimer
>
> This message is confidential and may contain privileged information. You 
> should not disclose its contents to any other person. If you are not the 
> intended recipient, please notify the sender named above immediately. It is 
> expressly declared that this e-mail does not constitute nor form part of a 
> contract or unilateral obligation. Opinions, conclusions and other 
> information in this message that do not relate to the official business of 
> brightsolid shall be understood as neither given nor endorsed by it.
> __
> This email has been scanned by the brightsolid Email Security System. 
> Powered by MessageLabs 
> __

__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__

__
"brightsolid" is used in this email to collectively mean brightsolid online 
innovati

Solr load balancer

2013-01-18 Thread Phil Hoy
Hi,

I would like to experiment with some custom load balancers to help with query 
latency in the face of long gc pauses and the odd time-consuming query that we 
need to be able to support. At the moment setting the socket timeout via the 
HttpShardHandlerFactory does help, but of course it can only be set to a length 
of time as long as the most time consuming query we are likely to receive.

For example perhaps a load balancer that sends multiple queries concurrently to 
all/some replicas and only keeps the first response might be effective. Or 
maybe a load balancer which takes account of the frequency of timeouts would be 
able to recognize zombies more effectively.

To use alternative load balancer implementations cleanly and without having to 
hack solr directly, I would need to be able to make the existing 
LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I can 
then override the default load balancer using solr's plugin mechanism.

So my question is, if I made a patch to make the load balancer more pluggable, 
is this something that would be acceptable and if so what do I do next?

Phil

__
"brightsolid" is used in this email to collectively mean brightsolid online 
innovation limited and its subsidiary companies brightsolid online publishing 
limited and brightsolid online technology limited.
findmypast.co.uk is a brand of brightsolid online publishing limited.
brightsolid online innovation limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC274983.
brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington 
Street, London EC2A 3DQ. Registered in England No. 04369607.
brightsolid online technology limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC161678.

Email Disclaimer

This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of brightsolid shall be 
understood as neither given nor endorsed by it.
__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__

RE: multi-core sharing synonym map

2012-10-12 Thread Phil Hoy
Yes I was thinking the same thing, although I was hoping there was a more 
elegant mechanism exposed by the solr infrastructure code to handle the shared 
map, aside from just using a global that is. 

Phil

-Original Message-
From: simon [mailto:mtnes...@gmail.com] 
Sent: 12 October 2012 19:38
To: solr-user@lucene.apache.org
Subject: Re: multi-core sharing synonym map

I definitely haven't tried this ;=) but perhaps you could create your own 
XXXSynonymFilterFactory  as a subclass of SynonymFilterFactory,  which would 
allow you to share the synonym map across all cores - though I think there 
would need to be a nasty global variable to hold a reference to it...

-Simon

On Fri, Oct 12, 2012 at 12:27 PM, Phil Hoy wrote:

> Hi,
>
> We have a multi-core set up with a fairly large synonym file, all 
> cores share the same schema.xml and synonym file but when solr loads 
> the cores, it loads multiple instances of the synonym map, this is a 
> little wasteful of memory and lengthens the start-up time. Is there a 
> way to get all cores to share the same map?
>
>
> Phil
>


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


RE: Unique terms without faceting

2012-10-10 Thread Phil Hoy
Hi,

I don't think you can use that component whilst taking into account any fq or q 
parameters.

Phil

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 10 October 2012 16:51
To: solr-user@lucene.apache.org
Subject: Re: Unique terms without faceting

The Solr TermsComponent:

http://wiki.apache.org/solr/TermsComponent

-- Jack Krupansky

-Original Message-
From: Phil Hoy
Sent: Wednesday, October 10, 2012 11:45 AM
To: solr-user@lucene.apache.org
Subject: Unique terms without faceting

Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms for 
a field?

Phil


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


Unique terms without faceting

2012-10-10 Thread Phil Hoy
Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms for 
a field?

Phil



RE: trunk cloud ui not working

2012-05-22 Thread Phil Hoy
Hi, 

I was using windows 7 but it is fine with chrome on Windows Web Server 2008 R2 
also I asked a colleague with windows 7 and it is fine for him too, so really 
sorry but I think it was a !'works on my machine' thing. 

Of course if I track down the cause I will reply to this email again.

Thanks,
Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 21 May 2012 18:22
To: solr-user@lucene.apache.org
Subject: Re: trunk cloud ui not working

What OS? I was just trying trunk and looking at that view on Chrome on OSX and 
Linux and did not see an issue.

On May 21, 2012, at 1:15 PM, Phil Hoy wrote:

> After further investigation I have found that it is not a problem on firefox, 
> only chrome and IE. 
> 
> Phil
> 
> -Original Message-
> Sent: 21 May 2012 18:05
> To: solr-user@lucene.apache.org
> Subject: trunk cloud ui not working
> 
> Hi,
> 
> I am running from the trunk and the localhost:8983/solr/#/~cloud page shows 
> nothing but "Fetch Zookeeper Data".
> 
> If I run fiddler I see that:
> http://localhost:8983/solr/zookeeper?wt=json&detail=true&path=%2Fclust
> erstate.json
> and
> http://localhost:8983/solr/zookeeper?wt=json&path=%2Flive_nodes
> are called and return data but no update to the ui.
> 
> Cheers,
> Phil
> 
> 
> __
> This email has been scanned by the brightsolid Email Security System. 
> Powered by MessageLabs 
> __

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


RE: trunk cloud ui not working

2012-05-21 Thread Phil Hoy
After further investigation I have found that it is not a problem on firefox, 
only chrome and IE. 

Phil

-Original Message-
Sent: 21 May 2012 18:05
To: solr-user@lucene.apache.org
Subject: trunk cloud ui not working

Hi,

I am running from the trunk and the localhost:8983/solr/#/~cloud page shows 
nothing but "Fetch Zookeeper Data".

If I run fiddler I see that:
http://localhost:8983/solr/zookeeper?wt=json&detail=true&path=%2Fclusterstate.json
and
http://localhost:8983/solr/zookeeper?wt=json&path=%2Flive_nodes
are called and return data but no update to the ui.

Cheers,
Phil


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


RE: Custom Sharding on solrcloud

2012-03-08 Thread Phil Hoy
Hi,

If I remove the DistributedUpdateProcessorFactory I will have to manage a 
master slave setup myself by updating solely to the master and replicating to 
any slave. I wonder is it possible to have distributed updates but confined to 
the sub-set of cores and replicas within  a collection that share the same name?

Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 08 March 2012 01:02
To: solr-user@lucene.apache.org
Subject: Re: Custom Sharding on solrcloud

Hi Phil - 

The default update chain now includes the distributed update processor by 
default - and if in solrcloud mode it will be active.

Probably, what you want to do is define your own update chain (see the wiki). 
Then you can add that update chain as the default for your json update handler 
in solrconfig.xml.

   
   
 mychain
   
 

The default chain is: 

  new LogUpdateProcessorFactory(),
  new DistributedUpdateProcessorFactory(),
  new RunUpdateProcessorFactory()

So just use Log and Run instead to get your old behavior.

- Mark

On Mar 7, 2012, at 1:37 PM, Phil Hoy wrote:

> Hi,
> 
> We have a large index and would like to shard by a particular field value, in 
> our case surname. This way we can scale out to multiple machines, yet as most 
> queries filter on surname we can use some application logic to hit just the 
> one core to get the results we need.
> 
> Furthermore as we anticipate the index will grow over time so it make sense 
> (to us) to host a number of shards on a single machine until they get too big 
> at which point we can then move them to another machine.
> 
> We are using solrcloud and it is set up using a solrcore per shard, that way 
> we can direct both queries and updates to the appropriate core/shard. To do 
> this our solr.xml looks a bit like this:
> 
>  zkClientTimeout="1" hostPort="8983" >  name="aaa-ava" instanceDir="/data/recordsets/shards/aaa-ava" 
> collection="recordsets" />
>instanceDir="/data/recordsets/shards/aaa-ava" collection="recordsets" />
>instanceDir="/data/recordsets/shards/avb-bel" collection="recordsets" />  
>   ...
> 
> Directed updates via:
> http:/server/solr/aaa-ava/update/json  [{surname:"adams"}]
> 
> Directed queries via:
> http:/server/solr/select?surname:adams&shards=aaa-ava
> 
> This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13  
> before the more recent solrcloud changes but now the update is not directed 
> to the appropriate core. Is there a better way to achieve our needs?
> 
> Phil
> 

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


Custom Sharding on solrcloud

2012-03-07 Thread Phil Hoy
Hi,

We have a large index and would like to shard by a particular field value, in 
our case surname. This way we can scale out to multiple machines, yet as most 
queries filter on surname we can use some application logic to hit just the one 
core to get the results we need.

Furthermore as we anticipate the index will grow over time so it make sense (to 
us) to host a number of shards on a single machine until they get too big at 
which point we can then move them to another machine.

We are using solrcloud and it is set up using a solrcore per shard, that way we 
can direct both queries and updates to the appropriate core/shard. To do this 
our solr.xml looks a bit like this:



   
   
...

Directed updates via:
http:/server/solr/aaa-ava/update/json  [{surname:"adams"}]

Directed queries via:
http:/server/solr/select?surname:adams&shards=aaa-ava

This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13  before 
the more recent solrcloud changes but now the update is not directed to the 
appropriate core. Is there a better way to achieve our needs?

Phil



RE: removing cores solrcloud

2012-02-01 Thread Phil Hoy
Hi,

I have tried removing the entry from zookeeper as well as from solr via 
admin/cores?action=uload and still the distributed query hits the missing core. 
I guess there is no zookeeper watcher in solr to update the core/shard state 
used by search. 

I got round the problem by doing the above then running a 
admin/cores?action=reload on any core in the collection, this seems to force 
the solr's distributed searcher to re-consult zookeeper.

Phil


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 31 January 2012 18:16
To: solr-user@lucene.apache.org
Subject: Re: removing cores solrcloud


On Jan 31, 2012, at 1:03 PM, Phil Hoy wrote:

> Hi Mark,
> 
> I am using the embedded zookeeper server, how would you recommend I connect 
> to it so that I can remove the missing core or is it only possible when using 
> a stand-alone zookeeper instance?

Nope, both cases are the same - you just need a ZK tool and the ZK address to 
connect that tool to ZK. ZK itself comes with some command line scripts that 
you could use - their are also a couple GUI tools out there.

If you use eclipse, my favorite way to interact with ZK is 
http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper

I think (hard to remember what came in when) you just have to remove the node 
from /node_states and the overseer will update the cluster state. Sami Siren 
might be able to comment more on that.

I am looking into doing this automatically when you unload a SolrCore - 
https://issues.apache.org/jira/browse/SOLR-3080

> 
> You are of course correct the reload command as well a few others should 
> cause a resync with the zookeepers state too.
> 
> I am currently using version 4.0.0.2011.12.12.09.26.56.
> 
> Phil
> 
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com] 
> Sent: 31 January 2012 16:09
> To: solr-user@lucene.apache.org
> Subject: Re: removing cores solrcloud
> 
> 
> On Jan 31, 2012, at 4:49 AM, Phil Hoy wrote:
> 
>> Hi,
>> 
>> I am running solrcloud and i am able to add cores 
>> http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how 
>> does one remove cores. If i use the core admin unload command, distributed 
>> queries then error as they still query the removed core. Do I need to update 
>> zookeeper somehow?
>> 
>> Phil
> 
> 
> Hey Phil - yeah, currently you would have to manually remove the core from 
> zookeeper. Once we see it, we expect it to be part of the index - perhaps we 
> should remove it on an explicit core reload though?
> 
> What version of trunk are you using?
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> __
> This email has been scanned by the brightsolid Email Security System. Powered 
> by MessageLabs
> __

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


RE: removing cores solrcloud

2012-01-31 Thread Phil Hoy
Hi Mark,

I am using the embedded zookeeper server, how would you recommend I connect to 
it so that I can remove the missing core or is it only possible when using a 
stand-alone zookeeper instance?

You are of course correct the reload command as well a few others should cause 
a resync with the zookeepers state too.

I am currently using version 4.0.0.2011.12.12.09.26.56.

Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 31 January 2012 16:09
To: solr-user@lucene.apache.org
Subject: Re: removing cores solrcloud


On Jan 31, 2012, at 4:49 AM, Phil Hoy wrote:

> Hi,
> 
> I am running solrcloud and i am able to add cores 
> http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how 
> does one remove cores. If i use the core admin unload command, distributed 
> queries then error as they still query the removed core. Do I need to update 
> zookeeper somehow?
> 
> Phil


Hey Phil - yeah, currently you would have to manually remove the core from 
zookeeper. Once we see it, we expect it to be part of the index - perhaps we 
should remove it on an explicit core reload though?

What version of trunk are you using?

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


removing cores solrcloud

2012-01-31 Thread Phil Hoy
Hi,

I am running solrcloud and i am able to add cores 
http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how does 
one remove cores. If i use the core admin unload command, distributed queries 
then error as they still query the removed core. Do I need to update zookeeper 
somehow?

Phil


solrcloud replicating new cores

2012-01-11 Thread Phil Hoy
Hi, 

Is it possible to configure solr using solrcloud and the distribution handler 
such that if a new core is added to the master then that core is added and 
replicated to the slaves.

Phil


RE: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Phil Hoy
Added issue: https://issues.apache.org/jira/browse/SOLR-2926
Please let me know if more information needs adding to JIRA.

Phil

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: 28 November 2011 19:32
To: solr-user@lucene.apache.org
Subject: Re: DirectSolrSpellChecker on request specified field.

technically it could? I'm just not sure if the current spellchecking
apis allow for it? But maybe someone has a good idea on how to easily
expose this.

I think its a good idea.

Care to open a JIRA issue?

On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy  wrote:
> Hi,
>
> Can the DirectSolrSpellChecker be used for autosuggest but defer to request 
> time the name of the field to use to create the dictionary. That way I don't 
> have to define spellcheckers specific to each field which for me is not 
> really possible as the fields I wish to spell check are DynamicFields.
>
> I could copy all dynamic fields into a 'spellcheck' field but then I could 
> get false suggestions if I use it to get suggestions for a particular dynamic 
> field where a term returned derives from a different field.
>
> Phil
>
>
>



-- 
lucidimagination.com

__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Phil Hoy
Hi,

Can the DirectSolrSpellChecker be used for autosuggest but defer to request 
time the name of the field to use to create the dictionary. That way I don't 
have to define spellcheckers specific to each field which for me is not really 
possible as the fields I wish to spell check are DynamicFields. 

I could copy all dynamic fields into a 'spellcheck' field but then I could get 
false suggestions if I use it to get suggestions for a particular dynamic field 
where a term returned derives from a different field. 

Phil




RE: Sort question

2011-11-25 Thread Phil Hoy
You might be able to sort by the map function q=*:*&sort=map(price,0,100, 
10) asc, price asc.

Phil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 25 November 2011 13:49
To: solr-user@lucene.apache.org
Subject: Re: Sort question

Not that I know of. You could conceivably do some
work at index time to create a field that would sort
in that order by doing some sort of mapping from
these values into a field that sorts the way you
want, or you might be able to do a plugin

Best
Erick

On Wed, Nov 23, 2011 at 3:29 AM, vraa  wrote:
> Hi
>
> I have a query where i sort by a column "price". This field can contain the
> following values
>
> 10
> 75000
> 15
> 1
> 225000
> 50
> 40
>
> I want to sort these values so that always between 0 and 100 always comes
> last.
>
> Eg sorting by price asc should look like this:
> 75000
> 10
> 15
> 225000
> 1
> 40
> 50
>
> Is this possible?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Sort-question-tp3530070p3530070.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


RE: Query a field with no value or a particular value.

2011-11-25 Thread Phil Hoy
Hi,

Thanks for getting back to me, and sorry the default q value was *:* so I 
omitted it from the example.

I do not have a problem getting the null values so q=*:*&fq=-field:[* TO *] 
indeed works but I also need the docs with a specific value e.g. fq=field:yes. 
Is this possible?

Phil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 25 November 2011 13:59
To: solr-user@lucene.apache.org
Subject: Re: Query a field with no value or a particular value.

You haven't specified any "q" clause, just an "fq" clause. Try
q=*:* -field:[* TO *]
or
q=*:*&fq=-field:[* TO *]

BTW, the logic of field:yes -field:[* TO *] makes no sense
You're saying "find me all the fields containing the value "yes" and
remove from that set all the fields containing any value at all"....

Best
Erick

On Fri, Nov 25, 2011 at 7:28 AM, Phil Hoy  wrote:
> Hi,
>
> Is it possible to constrain the results of a query to return docs were a 
> field contains no value or a particular value?
>
> I tried  ?fq=(field:yes OR -field:[* TO *]) but I get no results even though 
> queries with either ?fq=field:yes or ?fq=-field:[* TO *]) do return results.
>
>
> Phil
>

__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


Query a field with no value or a particular value.

2011-11-25 Thread Phil Hoy
Hi,

Is it possible to constrain the results of a query to return docs were a field 
contains no value or a particular value?

I tried  ?fq=(field:yes OR -field:[* TO *]) but I get no results even though 
queries with either ?fq=field:yes or ?fq=-field:[* TO *]) do return results.


Phil


NullPointerException with distributed facets

2011-11-22 Thread Phil Hoy
Hi,

When doing a distributed query in solr 4.0 (4.0.0.2011.06.25.15.36.22) with 
facet.missing=true and facet.limit=20 I get a NullPointerException. By 
increasing the facet limit to 200 or setting facet missing to false it seems to 
fix it. The shards both contain the field but one shard always has a value and 
one never has a value. Single shard queries work fine on each shard. Does 
anyone know the cause or a fix?

java.lang.NullPointerException
at 
org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:489)
at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:278)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1452)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Phil


RE: SolrCloud with large synonym files

2011-11-02 Thread Phil Hoy
I tried adding the property but it did not seem to improve things. I did 
however get it working by noticing that the ZkSolrResourceLoader has a fall 
back to load resources from the shared lib, this worked for me. 

Thanks for getting back to me.
Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 02 November 2011 15:06
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud with large synonym files


On Nov 2, 2011, at 7:47 AM, Phil Hoy wrote:

> Hi,
> 
> I am running solrcloud and a file in the Dbootstrap_confdir is a large large 
> synonym file (~50mb ) used by a SynonymFilterFactory configured in the 
> schema.xml. When i start solr I get a zookeeper exception presumably because 
> the file size is too large. 

> 
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> 
> Is there a way to either increase the limit in zookeeper or perhaps configure 
> the SynonymFilterFactory differently to get the file from somewhere external 
> to Dbootstrap_confdir?
> 
> Phil  


As a workaround you can try:

(Java system property:* jute.maxbuffer*)


This option can only be set as a Java system property. There is no
zookeeper prefix on it. It specifies the maximum size of the data
that can be stored in a znode. The default is 0xf, or just under
1M. If this option is changed, the system property must be set on
all servers and clients otherwise problems will arise. This is
really a sanity check. ZooKeeper is designed to store data on the
order of kilobytes in size.

Eventually there are other ways to solve this that we may offer...

Optional compression of files
Store a file across multiple zk nodes transparently when size is too large

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


RE: SolrCloud with large synonym files

2011-11-02 Thread Phil Hoy
It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers to 
ZkSolrResourceLoader to load the synonym file when in cloud mode.
Phil

-Original Message-
From: ☼ 林永忠 ☼ (Yung-chung Lin) [mailto:henearkrx...@gmail.com] 
Sent: 02 November 2011 12:24
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud with large synonym files

Hi,

I didn't use Solr with Zookeeper before. But Solr 3.4 implements the
synonym module with a different data structure. If the version of your Solr
is not 3.4, then maybe you can try upgrading it first.

See also this thread on stackoverflow.
http://stackoverflow.com/questions/6747664/solr-and-big-synonym-file

Yung-chung Lin

2011/11/2 Phil Hoy 

> Hi,
>
> I am running solrcloud and a file in the Dbootstrap_confdir is a large
> large synonym file (~50mb ) used by a SynonymFilterFactory configured in
> the schema.xml. When i start solr I get a zookeeper exception presumably
> because the file size is too large.
>
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
>at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>
> Is there a way to either increase the limit in zookeeper or perhaps
> configure the SynonymFilterFactory differently to get the file from
> somewhere external to Dbootstrap_confdir?
>
> Phil
>


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


SolrCloud with large synonym files

2011-11-02 Thread Phil Hoy
Hi,

I am running solrcloud and a file in the Dbootstrap_confdir is a large large 
synonym file (~50mb ) used by a SynonymFilterFactory configured in the 
schema.xml. When i start solr I get a zookeeper exception presumably because 
the file size is too large. 

Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)

Is there a way to either increase the limit in zookeeper or perhaps configure 
the SynonymFilterFactory differently to get the file from somewhere external to 
Dbootstrap_confdir?

Phil