Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-09 Thread Erick Erickson
Glad to hear it. Now, if you want to be really bold (and I haven’t verified it, 
but it _should_ work).

Rather than copy the index, try this:

1> spin up a one-replica empty collection
2> use the REPLICATION API to copy the index from the re-indexed source.
3> ADDREPLICAs as before.

<2> looks something like: 
http://_slave_host:port_/solr/_core_name_/replication?command=fetchindex=http://solr_with_new_index:port/solr/core_name_/replication.

_core_name_ in this case is something like collection1_shard1_replica1, i.e. 
what shows up in the “cores” dropdown.

The replication API is still used by SolrCloud for “full sync” and has been 
around forever, so it’s well-tested. Again, though, I don’t use this regularly 
so no guarantees…..

See: https://lucene.apache.org/solr/guide/7_5/index-replication.html

Best,
Erick

> On Apr 9, 2019, at 12:38 AM, kevinc  wrote:
> 
> Thanks so much - your approaches worked a treat!
> 
> Best,
> Kevin.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-09 Thread kevinc
Thanks so much - your approaches worked a treat!

Best,
Kevin.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread Shawn Heisey

On 4/8/2019 10:06 AM, Shawn Heisey wrote:

* Make sure you have a copy of the source index directory.
* Do not copy the tlog directory from the source.
* Create the collection in the target cloud.
* Shut down the target cloud completely.
* Delete all the index directories in the cloud.
* Copy the source index directory to one of the cloud nodes.
* Start that cloud node up.  Make sure it is all working.
* Start up the other nodes.


At the "delete all the index directories in the cloud" step, I should 
have written "delete the contents of all data directories for the 
collection in the cloud" ... everything in data should be deleted, not 
just the index directory.  Don't want it replaying transaction logs when 
Solr starts!


Thanks,
Shawn


Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread Shawn Heisey

On 4/8/2019 8:59 AM, kevinc wrote:

I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
cluster with 1 shard and replication factor of 3.

I want to copy over the index and have it replicate to the rest of the
cluster. I have taken a copy of the data directory from the reprocessed core
and copied it into the leader's data directory. This shows up correctly as
having a 51GB index and the documents are searchable.

I have tried the following curl commands to kick off replication:

curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
text/xml" --data-binary @test.xml
curl
http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E


I think the following is probably what you're going to want to do in 
order to transplant an existing index into a new cloud:


* Make sure you have a copy of the source index directory.
* Do not copy the tlog directory from the source.
* Create the collection in the target cloud.
* Shut down the target cloud completely.
* Delete all the index directories in the cloud.
* Copy the source index directory to one of the cloud nodes.
* Start that cloud node up.  Make sure it is all working.
* Start up the other nodes.

Once the other nodes are started, they will automatically notice that 
they don't have an index directory and will copy the index from the leader.


These instructions assume a single shard in both the source and the 
target.  If you are changing the number of shards, it will be a lot 
easier to simply reindex into the new cloud.


Erick's message indicates another way you could go ... create the new 
index with a single replica, get that working, and then use ADDREPLICA 
(part of the Collections API) to add more replicas.


Thanks,
Shawn


Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

2019-04-08 Thread Erick Erickson
Here’s what I’d do:

1> Just spin up a _one_ node cluster and copy the index from your offline 
process and start Solr. I’l probably do this with Solr down.
2> Use the ADDREPLICA command to build out that cluster. The index copy 
associated with ADDREPLICA is robust. I’d wait until each replica showed green 
before adding the next one if you have any concerns about saturating your 
network, if you added the replicas all at once they you’ll have N simultaneous 
copies of the 50G index.

I’m not quite sure what’s happening in your situation, there are a lot of 
possibilities. The above should just avoid most all of the places where 
something could go wrong with your process.

Best,
Erick

> On Apr 8, 2019, at 7:59 AM, kevinc  wrote:
> 
> Hi all,
> 
> I'm sure I've done this before but this seems to be falling down a bit and I
> was wondering if anyone had any helpful ideas.
> 
> I have a large index (51GB) that exists in a 4 node Solr Cloud instance. The
> reprocessing for this takes a long time and so we normally reindex on a
> secondary cluster and swap them out.
> 
> I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
> cluster with 1 shard and replication factor of 3.
> 
> I want to copy over the index and have it replicate to the rest of the
> cluster. I have taken a copy of the data directory from the reprocessed core
> and copied it into the leader's data directory. This shows up correctly as
> having a 51GB index and the documents are searchable.
> 
> I have tried the following curl commands to kick off replication:
> 
> curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
> text/xml" --data-binary @test.xml
> curl
> http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E
> 
> I've tried this a few times and had a few different results:
> The index gets set to 0 and has the single record I commit
> A timed index gets created (index.201904082111232) and index.properties then
> points to that
> I had an issue with IndexWriter being closed
> The index stays consistent and doesn't replicate
> I've tried copying the index to both the leader and one other node to see if
> that helps but I'm faced with similar results as above.
> 
> Does anyone have any advice to how I can get this index moved and replicated
> onto this new cluster?
> 
> Thanks a lot!
> Kevin.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: moving index

2007-09-27 Thread Yonik Seeley
On 9/27/07, Jae Joo [EMAIL PROTECTED] wrote:
 I do need to move the index files, but have a concerns any potential problem
 including performance?
 Do I have to keep the original document for querying?

I assume you posted XML documents in Solr XML format (like adddoc...)?
If so, that is just an example way to get the data into Solr.  Those
XML files aren't needed, and any high-speed indexing will avoid
creating files at all - just create the XML doc in memory and send to
solr via HTTP-POST.

-Yonik