Specifying shards when querying a alias.

2017-07-11 Thread philippa griggs
Hello,


Solr 5.4.1


I have two collections ‘Online’ and ‘Offline’ both collections have an implicit 
router and are sharded into weekly cores (for example an online shard would be 
Online_20170605). I have created an alias to query both collections called 
‘AllData’. I want to query the alias but specify the shards. So something like 
this:


http://localhost:8983/solr/AllData/select?q=*%3A*=Session_UTCStartTime+desc=json=true=Online_20170605,Offline_20170529



However I have noticed that this doesn’t work.  If I create the alias like this 
:


http://localhost:8983/solr/admin/collections?action=CREATEALIAS=AllData=Online,Offline


(with the Online collection mentioned first)


I can use Online_20170605 as a shard but not the Offline shard, for that I 
would have to specify the location of the shard.


http://localhost:8983/solr/AllData/select?q=*%3A*=Session_UTCStartTime+desc=json=true=Online_20170605,localhost:8983/solr/offline


If I delete the alias and recreate it with the Offline collection mentioned 
first:


http://localhost:8983/solr/admin/collections?action=CREATEALIAS=AllData=Offline,Online


I can use the Offline_20170529 as a shard but then have to specify the location 
of the shard for the online collection.


http://localhost:8983/solr/AllData/select?q=*%3A*=Session_UTCStartTime+desc=json=true=localhost:8984/solr/online,Offline_20170529



Is this expected behaviour? Or can anyone point out what I’m doing wrong?


Many thanks

Philippa




Both Nodes in shard think they are leader

2017-03-22 Thread philippa griggs
Hello,


I’m using Solr Cloud version 5.4.1.  I have two cores in a shard (a leader and 
replica) every so often they both go into recovery/down and then come back up. 
However when they come back, they both think they are leader.


I then have to manually step in, stop them both, start one and wait till its 
leader before starting the second one.


How anyone else seen this before or have any suggestions as to why this is 
happening?


Many thanks

Philippa


Implicit routing, delete on specific shard

2017-02-28 Thread philippa griggs
Hello,


Solr 5.4.1 using Solr Cloud, multiple cores with two cores per shard. Zookeeper 
3.4.6   (5 zookeeper ensemble).

We use an implicit router and split shards into weeks. Every now and again I 
need to run a delete on the system.  I do this by running the following command 
on one of the instances.

curl http://127.0.0.1:8983/solr/collection1/update/?commit=false -H 
"Content-Type: text/xml" -d "XXX"


Is there anyway of specifying the shards to run the delete on, instead of 
running it against the whole collection? I will always know what shards the 
sessions I want to delete will be on.

I know when you query, you can do something like this:

http://XXX:8983/solr/collection1/select?q=*%3A*=json=true=20170220

Is there similar function with the delete?

Something like:

curl http://127.0.0.1:8983/solr/collection1/update/?commit=false -H 
"Content-Type: text/xml" -d "XXX" -shard 
"20170220"

Many thanks

Philippa



Core replication, Slave not flipping to master

2017-02-15 Thread philippa griggs
Hello,



Solr 5.4.1, multiple cores with two cores per shard. Zookeeper 3.4.6   (5 
zookeeper ensemble).


I have noticed an error with the replication between two cores in a shard. I’m 
having to perform a schema update which means I have to stop and start the 
cores.  I’m trying to do this in a way so I don’t get any down time. Restarting 
one core in the shard, waiting for that to come back up before restarting the 
second one.


However when restarting the master, the slave isn’t flipping and becoming the 
master itself.  Instead I’m getting errors in the log as follows:


Exception while invoking 'details' method for replication on master -Server 
refused connection at xxx


When I run


http://xxx:8983/solr/core_name/replication?command=details


Is see




invalid_master

http://xxx:8983/solr/core_name/

Wed Feb 15 10:44:30 UTC 2017

false

false






Once the old master comes back up again, it comes in as a slave, which is what 
I would expect. However as the other core hasn’t flipped into becoming the 
master, I am left with both cores thinking they are slaves.


I would expect when the master goes down and is unreachable, the slave would 
flip and not just throw an error about the connection.  Does anyone have any 
ideas on why this is happening and could point me in the direction of what to 
do to fix this issue?


Many thanks

Philippa


Zookeeper connection issues

2016-10-10 Thread philippa griggs
Hello,


Solr Set up


Solr 5.4.1, Zookeeper 3.4.6   (5 zookeeper ensemble)


We have one collection which has multiple shards (two shards for each week). 
Each shard has a leader and a replica. We only write to the latest week- two 
shards (four cores) which we refer to a ‘hot cores’.   The rest, ‘cold cores’ 
are for queries. We have multiple solr processes running on an instance- 
currently 5 each with a 15Gb Heap (there is 122G available memory). As the 
index grows as the week goes on the heap size starts low and increase to around 
9/10Gb. The index size on each core ends up around 8 million docs, 6.5Gb which 
are stored on 40Gb drives. The zookeeper timeout is 60Secs.


The issue:


We are experiencing issues with connectivity  and have started seeing errors 
messages about being unable to connect to zookeeper. Most of the time solr 
recovers itself after a while but we are seeing these ‘blips’ more and more 
often with the last ‘blip’ ending up with manually restarting the hot cores. So 
far this has only been seen on one shard at a time. All other shards in the 
cluster don’t have an issue.


There is nothing in the zookeeper log. Below are the solr logs for the last 
‘blip’.


I’ve looked at the heap size and its not hitting 15Gb (max around 11Gb). At 
around the time of the blip the GC is 40sec, which is not over the timeout but 
is however much larger than we normally see.


These blips are happening towards the end of the week when the index size gets 
larger.


I’m not sure what is going on, is this a zookeeper issue or solr? What would be 
causing solr to lose connection with zookeeper if it’s not the timeout? We have 
checked the network and it doesn’t indicate a network issue.


Any suggests would be useful.



Error Logs for core A


2016-10-08 18:45:36.617 WARN  (qtp697960108-32664) [c:xxx s:20161003_A 
r:20161003_A54130 x:xxx] o.e.j.h.HttpParser badMessage: 
java.lang.IllegalStateException: too much data after closed for 
HttpChannelOverHttp@1b87c2cf{r=1370,c=false,a=IDLE,uri=-}

2016-10-08 18:45:36.717 WARN  
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) [c:xxx s:20161003_A r:20161003_A54130 x:xxx] 
o.a.s.c.ZkController Unable to read 
/collections/xxx/leader_initiated_recovery/20161003_A/20161003_A54130 due to: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/collections/xxx/leader_initiated_recovery/20161003_A/20161003_A54130

2016-10-08 18:45:44.907 ERROR 
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) [c:xxx s:20161003_A r:20161003_A54130 x:xxx] 
o.a.s.u.PeerSync PeerSync: core=xxx url=http://x.x.x.x:8987/solr ERROR, update 
log not in ACTIVE or REPLAY state. FSUpdateLog{state=BUFFERING, 
tlog=tlog{file=/solrLog_8987/tlog/tlog.0011469 refcount=1}}

2016-10-08 18:45:44.908 WARN  
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) [c:xxx s:20161003_A r:20161003_A54130 x:xxx] 
o.a.s.u.PeerSync PeerSync: core=xxx url=http://x.x.x.x:8987/solr too many 
updates received since start - startingUpdates no longer overlaps with our 
currentUpdates

2016-10-08 18:47:25.772 WARN  
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) [c:xxx s:20161003_A r:20161003_A54130 x:xxx] 
o.a.s.h.IndexFetcher File _1ftq.si did not match. expected checksum is 
4254234714 and actual is checksum 2090625558. expected length is 422 and actual 
length is 422

2016-10-08 18:47:26.286 WARN  
(zkCallback-3-thread-76-processing-n:x.x.x.x:8987_solr-EventThread) [   ] 
o.a.s.c.RecoveryStrategy Stopping recovery for core=xxx 
coreNodeName=20161003_A54130

2016-10-08 18:47:54.935 WARN  
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) [c:xxx s:20161003_A r:20161003_A54130 x:xxx] 
o.a.s.h.IndexFetcher File _1ftq.si did not match. expected checksum is 
4254234714 and actual is checksum 2090625558. expected length is 422 and actual 
length is 422

2016-10-08 18:47:54.939 WARN  
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) [c:xxx s:20161003_A r:20161003_A54130 x:xxx] 
o.a.s.h.IndexFetcher File _1ftq.cfs did not match. expected checksum is 
3006669569 and actual is checksum 3691917. expected length is 35641114 and 
actual length is 8402832

2016-10-08 18:47:55.084 WARN  
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) [c:xxx s:20161003_A r:20161003_A54130 x:xxx] 
o.a.s.h.IndexFetcher File _1ftq.cfe did not match. expected checksum is 
3692862263 and actual is checksum 3783720915. expected length is 289 and actual 
length is 289

2016-10-08 18:47:55.399 WARN  
(updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr x:xxx s:20161003_A 
c:xxx r:20161003_A54130) 

Uploading files to Zookeeper

2016-03-03 Thread philippa griggs
Hello


I have a set of pre-existing configuration files and want to create a new solr 
cluster.  As part of this new cluster I want to name the shards and use a 
CompositeId router.



My core.properties file is:


name=sessions

shard=${shard:shard1}

coreNodeName=${coreNodeName:node1}

loadOnStartup=true

transient=false

dataDir=/solr

ulogDir=/solrLog



And I set the shard name in the solr.in.sh using:


SOLR_OPTS="$SOLR_OPTS -Dshard=20160222"



If I upload the files to zookeeper as above, the router is set to implicit.  
I've then realised I need to specify numShards.  However, if I add numShards=1 
into the core.properties file and then upload the files, I get a shard called 
'shard1' which isn't linked to anything and then my named shard.  This 'shard1' 
is causing me problems and stopping solr from working.  What am I doing wrong? 
This there a setting which I am missing?


Many thanks

Philippa


Multiple solr instances on one server

2016-01-04 Thread philippa griggs
Hello,


(Solr 5.2.1)


I'm wanting to run multple solr instances on one server, does anyone know which 
is better- allowing each solr instance to use their own internal jetty or to 
install jetty on the server?


Many thanks


Philippa


Re: Multiple solr instances on one server

2016-01-04 Thread philippa griggs
Hello,

Thanks for your reply.  Do you know if there are many disadvantages to running 
multiple solr instances all running their own internal jetty. I'm trying to 
work out if this would work or if I would need to install jetty myself on the 
machine and use that instead. I'm not sure how many solr instances I would need 
to run yet, it could be as high as 10.

From: Mugeesh Husain 
Sent: 04 January 2016 15:28
To: solr-user@lucene.apache.org
Subject: Re: Multiple solr instances on one server

you could start solr with multiple port like below


bin/solr start -p 8983 one instance
bin/solr start -p 8984 second instance and so its depend on you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple solr instances on one server

2016-01-04 Thread philippa griggs
We store a huge amount of data across 10 shards and are getting to a point 
where we keep having to up the heap to stop solr from crashing.  We are trying 
to keep the heap size down, and plan to to host multiple solr instances on each 
server which will have a much smaller heap size.

From: Mugeesh Husain 
Sent: 04 January 2016 16:01
To: solr-user@lucene.apache.org
Subject: Re: Multiple solr instances on one server

you could use inbuilt(internal) jetty in the production, its depend on
requirement.

if you want to use other container, tomcat would be the best.

Elaborate your requirement Please why you want to use multiple instance in a
single server ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection API migrate statement

2015-12-16 Thread philippa griggs
Hello,

Thanks for your reply.  

As you suggested, I've tried running the operation along with the async command 
and it works- thank you. My next question is: Is there any way of finding out 
more information on the completed task? As I'm currently testing the new solr 
configuration, it would be handy to know the runtime of the operation.

Many thanks

Philippa


From: Shalin Shekhar Mangar <shalinman...@gmail.com>
Sent: 15 December 2015 19:05
To: solr-user@lucene.apache.org
Subject: Re: Collection API migrate statement

The migrate is a long running operation. Please use it along with
async= parameter so that it can execute in
the background. Then you can use the request status API to poll and
wait until the operation completes. If there is any error then the
same request status API will return the response. See
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RequestStatus

On Tue, Dec 15, 2015 at 9:27 PM, philippa griggs
<philippa.gri...@hotmail.co.uk> wrote:
> Hello,
>
>
> Solr 5.2.1.
>
>
> I'm using the collection API migrate statement in our test environment with 
> the view to implement a Hot, Cold arrangement- newer documents will be kept 
> on the Hot collection and each night the oldest documents will be migrated 
> into the Cold collection. I've got it all working with a small amount of 
> documents (around 28,000).
>
>
> I'm now trying to migrate around 200,000 documents and am getting 'migrate 
> the collection time out:180s'  message back.
>
>
> The logs from the source collection are:
>
>
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created 
> replica of temp source collection on target leader node
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp 
> source collection replica to target leader
> INFO  - 2015-12-15 14:45:36.648; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on 
> path /overseer/collection-queue-work/qnr-04 state SyncConnected
> INFO  - 2015-12-15 14:45:36.651; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged 
> fired on path /overseer/collection-queue-work state SyncConnected
> ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: migrate the collection time out:180s
> at 
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
> etc
>
>
> The logs from the target collection are:
>
> INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; 
> Took 22 ms to seed version buckets with highest version 1520634636692094979
> INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] 
> org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. 
> core=split_shard1_temp_shard2_shard1_replica2
> INFO  - 2015-12-15 14:43:19.199; [   ] 
> org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}
>
> As there are no errors in the target collection, am I right in assuming the 
> timeout occured because the merge took too long? If that is so, how to I 
> increase the timeout period? Ideally I will need to migrate around 2 million 
> documents a night.
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>



--
Regards,
Shalin Shekhar Mangar.


Collection API migrate statement

2015-12-15 Thread philippa griggs
Hello,


Solr 5.2.1.


I'm using the collection API migrate statement in our test environment with the 
view to implement a Hot, Cold arrangement- newer documents will be kept on the 
Hot collection and each night the oldest documents will be migrated into the 
Cold collection. I've got it all working with a small amount of documents 
(around 28,000).


I'm now trying to migrate around 200,000 documents and am getting 'migrate the 
collection time out:180s'  message back.


The logs from the source collection are:


INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created replica 
of temp source collection on target leader node
INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp 
source collection replica to target leader
INFO  - 2015-12-15 14:45:36.648; [   ] 
org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on path 
/overseer/collection-queue-work/qnr-04 state SyncConnected
INFO  - 2015-12-15 14:45:36.651; [   ] 
org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged fired 
on path /overseer/collection-queue-work state SyncConnected
ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: migrate the collection time out:180s
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
etc


The logs from the target collection are:

INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  
split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; 
Took 22 ms to seed version buckets with highest version 1520634636692094979
INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  
split_shard1_temp_shard2_shard1_replica2] 
org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. 
core=split_shard1_temp_shard2_shard1_replica2
INFO  - 2015-12-15 14:43:19.199; [   ] 
org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}

As there are no errors in the target collection, am I right in assuming the 
timeout occured because the merge took too long? If that is so, how to I 
increase the timeout period? Ideally I will need to migrate around 2 million 
documents a night.


Any help would be much appreciated.


Philippa




Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-08 Thread philippa griggs
Hello Erick,

Thanks for your reply.  

We have one collection and are writing documents to that collection all the 
time- it peaks at around 2,500 per minute and dips to 250 per minute,  the size 
of the document varies. On each node we have around 55,000,000 documents with a 
data size of 43G located on a drive of 200G.

Each node has 122G memory, the heap size is currently set at 45G although we 
have plans to increase this to 50G. 

The heap settings we are using are:

 -XX: +UseG1GC, 
-XX:+ParallelRefProcEnabled.

Please let me know if you need any more information.

Philippa

From: Erick Erickson <erickerick...@gmail.com>
Sent: 07 December 2015 16:53
To: solr-user
Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

Tell us a bit more.

Are you adding documents to your collections or adding more
collections? Solr is a balancing act between the number of docs you
have on each node and the memory you have allocated. If you're
continually adding docs to Solr, you'll eventually run out of memory
and/or hit big GC pauses.

How much memory are you allocating to Solr? How much physical memory
to you have? etc.

Best,
Erick


On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
<philippa.gri...@hotmail.co.uk> wrote:
> Hello,
>
>
> I'm using:
>
>
> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
>
>
> Zookeeper 3.4.6.
>
>
> About half a year ago we upgraded to Solr 5.2.1 and since then have been 
> experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
> will go down. Sometimes they will recover by themselves but more often than 
> not we have to step in to restart nodes.
>
>
> Nothing in the logs jumps out as being the problem. With the latest wipe out 
> we noticed that 10 out of the 20 nodes had garbage collections over 1min all 
> at the same time, with the heap usage spiking up in some cases to 80%. We 
> also noticed the amount of selects run on the solr cluster increased just 
> before the wipe out.
>
>
> Increasing the heap size seems to help for a while but then it starts 
> happening again- so its more like a delay than a fix. Our GC settings are set 
> to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
>
>
> With our previous version of solr (4.10.0) this didn't happen. We had 
> nodes/shards go down but it was contained, with the new version they all seem 
> to go at around the same time. We can't really continue just increasing the 
> heap size and would like to solve this issue rather than delay it.
>
>
> Has anyone experienced something simular?
>
> Is there a difference between the two versions around the recovery process?
>
> Does anyone have any suggestions on a fix.
>
>
> Many thanks
>
>
> Philippa
>

Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-08 Thread philippa griggs
Hello Emir,

The query load is around 35 requests per min on each shard, we don't document 
route so we query the entire index.

We do have some heavy queries like faceting and its possible that a heavy 
queries is causing the nodes to go down- we are looking into this.  I'm new to 
solr so this could be a slightly stupid question but would a heavy query cause 
most of the nodes to go down? This didn't happen with the previous solr version 
we were using Solr 4.10.0, we did have nodes/shards which went down but there 
wasn't wipe out effect where most of the nodes go.

Many thanks

Philippa 


From: Emir Arnautovic <emir.arnauto...@sematext.com>
Sent: 08 December 2015 10:38
To: solr-user@lucene.apache.org
Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

Hi Phillippa,
My guess would be that you are running some heavy queries (faceting/deep
paging/large pages) or have high query load (can you give bit details
about load) or have misconfigured caches. Do you query entire index or
you have query routing?

You have big machine and might consider running two Solr on each node
(with smaller heap) and split shards so queries can be more
parallelized, resources better utilized, and smaller heap to GC.

Regards,
Emir

On 08.12.2015 10:49, philippa griggs wrote:
> Hello Erick,
>
> Thanks for your reply.
>
> We have one collection and are writing documents to that collection all the 
> time- it peaks at around 2,500 per minute and dips to 250 per minute,  the 
> size of the document varies. On each node we have around 55,000,000 documents 
> with a data size of 43G located on a drive of 200G.
>
> Each node has 122G memory, the heap size is currently set at 45G although we 
> have plans to increase this to 50G.
>
> The heap settings we are using are:
>
>   -XX: +UseG1GC,
> -XX:+ParallelRefProcEnabled.
>
> Please let me know if you need any more information.
>
> Philippa
> 
> From: Erick Erickson <erickerick...@gmail.com>
> Sent: 07 December 2015 16:53
> To: solr-user
> Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
>
> Tell us a bit more.
>
> Are you adding documents to your collections or adding more
> collections? Solr is a balancing act between the number of docs you
> have on each node and the memory you have allocated. If you're
> continually adding docs to Solr, you'll eventually run out of memory
> and/or hit big GC pauses.
>
> How much memory are you allocating to Solr? How much physical memory
> to you have? etc.
>
> Best,
> Erick
>
>
> On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
> <philippa.gri...@hotmail.co.uk> wrote:
>> Hello,
>>
>>
>> I'm using:
>>
>>
>> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
>>
>>
>> Zookeeper 3.4.6.
>>
>>
>> About half a year ago we upgraded to Solr 5.2.1 and since then have been 
>> experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
>> will go down. Sometimes they will recover by themselves but more often than 
>> not we have to step in to restart nodes.
>>
>>
>> Nothing in the logs jumps out as being the problem. With the latest wipe out 
>> we noticed that 10 out of the 20 nodes had garbage collections over 1min all 
>> at the same time, with the heap usage spiking up in some cases to 80%. We 
>> also noticed the amount of selects run on the solr cluster increased just 
>> before the wipe out.
>>
>>
>> Increasing the heap size seems to help for a while but then it starts 
>> happening again- so its more like a delay than a fix. Our GC settings are 
>> set to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
>>
>>
>> With our previous version of solr (4.10.0) this didn't happen. We had 
>> nodes/shards go down but it was contained, with the new version they all 
>> seem to go at around the same time. We can't really continue just increasing 
>> the heap size and would like to solve this issue rather than delay it.
>>
>>
>> Has anyone experienced something simular?
>>
>> Is there a difference between the two versions around the recovery process?
>>
>> Does anyone have any suggestions on a fix.
>>
>>
>> Many thanks
>>
>>
>> Philippa
> >

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-07 Thread philippa griggs
Hello,


I'm using:


Solr 5.2.1 10 shards each with a replica. (20 nodes in total)


Zookeeper 3.4.6.


About half a year ago we upgraded to Solr 5.2.1 and since then have been 
experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
will go down. Sometimes they will recover by themselves but more often than not 
we have to step in to restart nodes.


Nothing in the logs jumps out as being the problem. With the latest wipe out we 
noticed that 10 out of the 20 nodes had garbage collections over 1min all at 
the same time, with the heap usage spiking up in some cases to 80%. We also 
noticed the amount of selects run on the solr cluster increased just before the 
wipe out.


Increasing the heap size seems to help for a while but then it starts happening 
again- so its more like a delay than a fix. Our GC settings are set to -XX: 
+UseG1GC, -XX:+ParallelRefProcEnabled.


With our previous version of solr (4.10.0) this didn't happen. We had 
nodes/shards go down but it was contained, with the new version they all seem 
to go at around the same time. We can't really continue just increasing the 
heap size and would like to solve this issue rather than delay it.


Has anyone experienced something simular?

Is there a difference between the two versions around the recovery process?

Does anyone have any suggestions on a fix.


Many thanks


Philippa



Re: Protect against duplicates with the Migrate statement

2015-12-02 Thread philippa griggs
I used two fields to set up the signature, the unique Id and a time stamp field.

As its in test, I set it up- cleared all the data out in both collecionsand 
reloaded it. I could see the signature which was created. I then migrated into 
cold collection which already had documents in with the same unique id and 
signature.
I ended up with duplicates in the cold collection.

Thanks for your help,

Philippa


From: Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Sent: 03 December 2015 02:30:31
To: solr-user@lucene.apache.org
Subject: Re: Protect against duplicates with the Migrate statement

Hi Philippa,

Which field did you use to set it as SignatureField in your ColdDocuments
when you implement the de-duplication?

Regards,
Edwin


On 2 December 2015 at 18:59, philippa griggs <philippa.gri...@hotmail.co.uk>
wrote:

> Hello,
>
>
> I'm using Solr 5.2.1 and Zookeeper 3.4.6.
>
>
> I'm implementing two collections - HotDocuments and ColdDocuments . New
> documents will only be written to HotDocuments and every night I will
> migrate a chunk of documents into ColdDocuments.
>
>
> In the test environment, I have the Collection API migrate statement
> working fine. I know this won't handle duplicates ending up in the
> ColdDocuments collection and I don't expect to have duplicate documents but
> I would like to protect against it- just in case.
>
>
> We have a unique key and I've tried to implement de-duplication (
> https://cwiki.apache.org/confluence/display/solr/De-Duplication) but I
> still end up with duplicates in the ColdDocuments collection.
>
>
>
> Does anyone have any suggestions on how I can protect against duplicates
> with the migrate statement?  Any ideas would be greatly appreciated.
>
>
> Many thanks
>
> Philippa
>


Protect against duplicates with the Migrate statement

2015-12-02 Thread philippa griggs
Hello,


I'm using Solr 5.2.1 and Zookeeper 3.4.6.


I'm implementing two collections - HotDocuments and ColdDocuments . New 
documents will only be written to HotDocuments and every night I will migrate a 
chunk of documents into ColdDocuments.


In the test environment, I have the Collection API migrate statement working 
fine. I know this won't handle duplicates ending up in the ColdDocuments 
collection and I don't expect to have duplicate documents but I would like to 
protect against it- just in case.


We have a unique key and I've tried to implement de-duplication 
(https://cwiki.apache.org/confluence/display/solr/De-Duplication) but I still 
end up with duplicates in the ColdDocuments collection.



Does anyone have any suggestions on how I can protect against duplicates with 
the migrate statement?  Any ideas would be greatly appreciated.


Many thanks

Philippa


Solr Collections Migrate statement

2015-11-24 Thread philippa griggs
Hello,


I'm using Solr 5.2.1 and Zookeeper 3.4.6.


I have a quick question about the Solr Collection API and the migrate statement.


I've got a test environment working with two shards and two collections. When I 
run the Migrate command the documents appear on the target collection no 
problem, however I'm still seeing them on the source collection. For some 
reason I assumed that they would be deleted from the source when moved across.


Just wondering what the expected behaviour should be- does the source 
collection stay exactly the same or should the migrated documents be deleted?


Many thanks

Philippa


Re: Solr Collections Migrate statement

2015-11-24 Thread philippa griggs
Thank you. Just wanted to double check.

Philippa


From: Shalin Shekhar Mangar <shalinman...@gmail.com>
Sent: 24 November 2015 13:48
To: solr-user@lucene.apache.org
Subject: Re: Solr Collections Migrate statement

Hi Philippa,

This is the expected behavior. The old documents stay on the source
collection so you can verify the result of the migrate and delete them
yourself.

On Tue, Nov 24, 2015 at 6:01 PM, philippa griggs
<philippa.gri...@hotmail.co.uk> wrote:
> Hello,
>
>
> I'm using Solr 5.2.1 and Zookeeper 3.4.6.
>
>
> I have a quick question about the Solr Collection API and the migrate 
> statement.
>
>
> I've got a test environment working with two shards and two collections. When 
> I run the Migrate command the documents appear on the target collection no 
> problem, however I'm still seeing them on the source collection. For some 
> reason I assumed that they would be deleted from the source when moved across.
>
>
> Just wondering what the expected behaviour should be- does the source 
> collection stay exactly the same or should the migrated documents be deleted?
>
>
> Many thanks
>
> Philippa



--
Regards,
Shalin Shekhar Mangar.


Re: API collection, Migrate deleting all data

2015-11-04 Thread philippa griggs
Hello Alessandro, 

An example of a document:

{
"SolrShard": "05/10/2015!02bebd2e-f12b-4787-bef2-57b0c5ed42ee",
"ContentVersion": 11,
"Session_Browser": "IE 11",
"Session_UserAgent": [
  "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
],
"Session_OS": "Windows 7",
"_version_": 1516909630614143000
  },

With the unique key being SolrShard:


"SolrShard": "05/10/2015!02bebd2e-f12b-4787-bef2-57b0c5ed42ee"

I verified by loading two dates, the document routing worked as follows:

Shard 1:  05/10/2015 data 
Shard 2:  02/10/2015 data 

using the query:

/solr/sessionfilterset/select?q=*%3A*=json=true&_route_=05/10/2015!

returned the expected results. 

However, I've now got more data loaded:

Shard 1: has 05/10/2015 data on it
Shard 2: has 02/10/2015 and 01/10/2015 data on it.

When I query  

/solr/sessionfilterset/select?q=*%3A*=json=true&_route_=02/10/2015!

I get results from both 02/10/2015 and 01/10/2015. 

For these documents the  unique key is correct, with either 02/10/2015! or 
01/10/2015! before the GUID

Any ideas?

Thanks for your help.







From: Alessandro Benedetti <abenede...@apache.org>
Sent: 04 November 2015 12:29
To: solr-user@lucene.apache.org
Subject: Re: API collection, Migrate deleting all data

Hi Philippa,
can you show us an example of document ?
In particular i would like to see the ID you are using.
I would expect a compositeId in the form:
shardkey!id

have you verified first of all that the compositeId routing and shardKey is
currently working ?
This is the first step, as I think the parameters you are using are ok, but
just wondering if something is wrong with the compositeId you are using.

Cheers

On 4 November 2015 at 11:40, philippa griggs <philippa.gri...@hotmail.co.uk>
wrote:

> Hello,
>
>
> Solr 5.2.1, Zookeeper 3.4.6
>
>
> I'm trying the use the solr Collection API to migrate documents in a test
> environment. I have two collections set up
>
>
> HotSessions - two shards, no replicas
>
> ColdSessions - 1 shard, no replicas.
>
>
> I've upload some sample data and using document routing with a split.key
> of the date e.g. 05/10/2015!.  The command I use to migrate is:
>
>
>
> /admin/collections?action=MIGRATE=HotSessions=05/10/2015!=ColdSessions=60
>
>
> This returns:
>
>
> 
> 
> 0
> 14608
> 
> 
> 
> 
> 0
> 1
> 
> ColdSessions
> BUFFERING
> 
> 
> 
> 0
> 1818
> 
> split_shard2_temp_shard1_shard1_replica1
> 
> 
> 
> 0
> 1077
> 
> 
> 
> 
> 0
> 43
> 
> 
> 
> 
> 0
> 1371
> 
> split_shard2_temp_shard1_shard1_replica2
> 
> 
> 
> 0
> 9014
> 
> 
> 
> 
> 0
> 70
> 
> 
> 
> 
> 0
> 0
> 
> ColdSessions
> EMPTY_BUFFER
> 
> 
> 
> 0
> 30
> 
> 
> 
> 
> 0
> 31
> 
> 
> 
> 
>
>
> In the error logs there is the migrate message
>
>
> OverseerCollectionProcessor.processMessage : migrate , {
>   "collection":"HotSessions",
>   "split.key":"02/10/2015!",
>   "target.collection":"ColdSessions",
>   "forward.timeout":"60",
>   "operation":"migrate"}
>
>
> Followed by:
>
> -WARN 'no frame of reference to tell if we've missed updates'
>
> -ERROR 'Error removing directory:java.io.IOException: Failed to list
> contents of /solr/lost+found
>
> -WARN 'Our node is no longer in line to be leader'
>
> -ERROR 'auto commit error...: java.nio.file.NoSuchFileException:
> /solr/ColdSessions/index/pending_segments_3'
>
> -ERROR 'IO error while trying to get the size of the Directory:
> java.nio.file.NoSuchFileException: /Solr/ColdSessions/Index'
>
> -ERROR 'IO error while trying to get the size of the Directory:
> java.nio.file.NoSuchFileException: /Solr/HotSessions/Index'
>
> -ERROR 'IO error while trying to get the size of the Directory:
> java.nio.file.NoSuchFileException: /Solr/HotSessions/ind'
>
>
>
>
> When I  query solr- all my session have gone, from both collections.
> Looking at the file structure, in my data folder all files have gone apart
> from a lost+found folder.
>
>
> What am I doing wrong? Is there anything in my setup which would cause
> this or is my API call wrong?
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>
>
>
>
>


--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

API collection, Migrate deleting all data

2015-11-04 Thread philippa griggs
Hello,


Solr 5.2.1, Zookeeper 3.4.6


I'm trying the use the solr Collection API to migrate documents in a test 
environment. I have two collections set up


HotSessions - two shards, no replicas

ColdSessions - 1 shard, no replicas.


I've upload some sample data and using document routing with a split.key of the 
date e.g. 05/10/2015!.  The command I use to migrate is:


/admin/collections?action=MIGRATE=HotSessions=05/10/2015!=ColdSessions=60


This returns:




0
14608




0
1

ColdSessions
BUFFERING



0
1818

split_shard2_temp_shard1_shard1_replica1



0
1077




0
43




0
1371

split_shard2_temp_shard1_shard1_replica2



0
9014




0
70




0
0

ColdSessions
EMPTY_BUFFER



0
30




0
31






In the error logs there is the migrate message


OverseerCollectionProcessor.processMessage : migrate , {
  "collection":"HotSessions",
  "split.key":"02/10/2015!",
  "target.collection":"ColdSessions",
  "forward.timeout":"60",
  "operation":"migrate"}


Followed by:

-WARN 'no frame of reference to tell if we've missed updates'

-ERROR 'Error removing directory:java.io.IOException: Failed to list contents 
of /solr/lost+found

-WARN 'Our node is no longer in line to be leader'

-ERROR 'auto commit error...: java.nio.file.NoSuchFileException: 
/solr/ColdSessions/index/pending_segments_3'

-ERROR 'IO error while trying to get the size of the Directory: 
java.nio.file.NoSuchFileException: /Solr/ColdSessions/Index'

-ERROR 'IO error while trying to get the size of the Directory: 
java.nio.file.NoSuchFileException: /Solr/HotSessions/Index'

-ERROR 'IO error while trying to get the size of the Directory: 
java.nio.file.NoSuchFileException: /Solr/HotSessions/ind'




When I  query solr- all my session have gone, from both collections. Looking at 
the file structure, in my data folder all files have gone apart from a 
lost+found folder.


What am I doing wrong? Is there anything in my setup which would cause this or 
is my API call wrong?


Any help would be much appreciated.


Philippa







Zookeeper issue causing all nodes to fail

2015-10-26 Thread philippa griggs
Hello all,


We have been experiencing some major solr issues.


Solr 5.2.1 10 Shards each with a replica (20 nodes in total).

Three external zookeepers 3.4.6


Node 19 went down, a short while after this occurred all our nodes were wiped 
out.  The cloud diagram, live_nodes and clusterstate.json all showed different 
nodes as being down/active and when refreshed it changed.


Looking at the logs across all the nodes there were zookeeper errors (a couple 
of examples below), however there was no out of the ordinary errors in 
zookeeper.



WARN  - 2015-10-22 09:39:48.536; [   ] org.apache.solr.cloud.ZkController$4; 
listener throws error

org.apache.solr.common.SolrException: 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /configs/XXX/params.json

   at 
org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:163)

   at 
org.apache.solr.core.SolrConfig.refreshRequestParams(SolrConfig.java:926)

   at org.apache.solr.core.SolrCore$11.run(SolrCore.java:2580)

   at org.apache.solr.cloud.ZkController$4.run(ZkController.java:2376)

Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /configs/XXX/params.json

   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)

   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)

   at 
org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:294)

   at 
org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:291)

   at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)

   at 
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:291)

   at 
org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:153)

   ... 3 more


ERROR - 2015-10-26 11:28:13.141; [XXX shard6  ] 
org.apache.solr.common.SolrException; There was a problem trying to register as 
the leader:org.apache.solr.common.SolrException: Could not register as the 
leader because creating the ephemeral registration node in ZooKeeper failed

   at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:154)

   at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:330)

   at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:198)

   at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:159)

   at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:348)

   at 
org.apache.solr.cloud.ZkController.joinElection(ZkController.java:1075)

   at org.apache.solr.cloud.ZkController.register(ZkController.java:888)

   at 
org.apache.solr.cloud.ZkController$RegisterCoreAsync.call(ZkController.java:226)

   at java.util.concurrent.FutureTask.run(Unknown Source)

   at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)

   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   at java.lang.Thread.run(Unknown Source)



ERROR - 2015-10-26 11:29:45.757; [XXX shard6  ] 
org.apache.solr.common.SolrException; Error while trying to recover. 
core=XXX:org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /overseer/queue/qn-

   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)

   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)

   at 
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:380)

   at 
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:377)

   at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)

   at 
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:377)

   at 
org.apache.solr.cloud.DistributedQueue.createData(DistributedQueue.java:380)

   at 
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:364)

   at org.apache.solr.cloud.ZkController.publish(ZkController.java:1219)

   at org.apache.solr.cloud.ZkController.publish(ZkController.java:1129)

   at org.apache.solr.cloud.ZkController.publish(ZkController.java:1125)

   at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:348)

   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:229)




WARN  - 2015-10-26 11:32:03.116; [   XXX] org.apache.solr.cloud.ZkController$4; 
listener throws error

org.apache.solr.common.SolrException: Unable to reload core [XXX]

   at