Re: How to update SOLR schema from continuous integration environment

Walter Underwood Sat, 01 Nov 2014 20:43:32 -0700

You do that with schema changes and I’ll watch your site crash.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/



On Nov 1, 2014, at 8:31 PM, Will Martin <wmartin...@gmail.com> wrote:

> Well yes. But since there hasn't been any devops approaches yet, we really
> aren't talking about Continuous Delivery. Continually delivering builds into
> production is old hat and Jack nailed the canonical manners in which it has
> been done. It really depends on whether an org is investing in the full
> Agile lifecycle. A piece at a time is common,.
> 
> One possible devop approach:
> 
> Once you get near full test automation
> : Jenkins builds the target
> : chef does due diligence on dependencies
> : chef pulls the build over. 
> : chef configures the build once it is installed.
> :chef takes the machine out of the load-balancers rotation
> : chef puts the machine back in once it is launched and sanity tested (by
> chef).
> 
> <or puppet or any others I'm not familiar with>
> 
> 
> If you substitute Jack's plan, you get pretty much the same thing; except
> that by using devops tools you introduce a little thing called idempotency.
> 
> 
> 
> -----Original Message-----
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: Saturday, November 01, 2014 12:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to update SOLR schema from continuous integration
> environment
> 
> Nice pictures, but that preso does not even begin to answer the question.
> 
> With master/slave replication, I do schema migration in two ways, depending
> on whether a field is added or removed.
> 
> Adding a field:
> 
> 1. Update the schema on the slaves. A defined field with no data is not a
> problem.
> 2. Update the master.
> 3. Reindex to populate the field and wait for replication.
> 4. Update the request handlers or clients to use the new field.
> 
> Removing a field is the opposite. I haven't tried lately, but Solr used to
> have problems with a field that was in the index but not in the schema.
> 
> 1. Update the request handlers and clients to stop using the field.
> 2. Reindex without any data for the field that will be removed, wait for
> replication.
> 3. Update the schema on the master and slaves.
> 
> I have not tried to automate this for continuous deployment. It isn't a big
> deal for a single server test environment. It is the prod deployment that is
> tricky.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
> 
> 
> On Nov 1, 2014, at 7:29 AM, Will Martin <wmartin...@gmail.com> wrote:
> 
>> 
> http://www.thoughtworks.com/insights/blog/enabling-continuous-delivery-enter
> prises-testing
>> 
>> 
>> -----Original Message-----
>> From: Jack Krupansky [mailto:j...@basetechnology.com] 
>> Sent: Saturday, November 01, 2014 9:46 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to update SOLR schema from continuous integration
> environment
>> 
>> In all honesty, incrementally updating resources of a production server is
> a rather frightening proposition. Parallel testing is always a better way to
> go - bring up any changes in a parallel system for testing and then do an
> atomic "swap" - redirection of requests from the old server to the new
> server and then retire the old server only after the new server has had
> enough time to burn in and get past any infant mortality problems.
>> 
>> That's production. Testing and dev? Who needs the hassle; just tear the
> old server down and bring up the new server from scratch with all resources
> updated from the get-go.
>> 
>> Oh, and the starting point would be keeping your full set of config and
> resource files under source control so that you can carefully review changes
> before they are "pushed", can compare different revisions, and can easily
> back out a revision with confidence rather than "winging it."
>> 
>> That said, a lot of production systems these days are not designed for
> parallel operation and swapping out parallel systems, especially for cloud
> and cluster systems. In these cases the reality is more of a "rolling
> update", where one node at a time is taken down, updated, brought up,
> tested, brought back into production, tested some more, and only after
> enough burn in time do you move to the next node.
>> 
>> This rolling update may also force you to sequence or stage your changes
> so that old and new nodes are at least relatively compatible. So, the first
> stage would update all nodes, one at a time, to the intermediate compatible
> change, and only when that rolling update of all nodes is complete would you
> move up to the next stage of the update to replace the intermediate update
> with the final update. And maybe more than one intermediate stage is
> required for more complex updates.
>> 
>> Some changes might involve upgrading Java jars as well, in a way that
> might cause nodes give incompatible results, in which case you may need to
> stage or sequence your Java changes as well, so that you don't make the
> final code change until you have verified that all nodes have compatible
> intermediate code that is compatible with both old nodes and new nodes.
>> 
>> Of course, it all depends on the nature of the update. For example, adding
> more synonyms may or may not be harmless with respect to whether existing
> index data becomes invalidated and each node needs to be completely
> reindexed, or if query-time synonyms are incompatible with index-time
> synonyms. Ditto for just about any analysis chain changes - they may be
> harmless, they may require full reindexing, they may simply not work for new
> data (i.e., a synonym is added in response to late-breaking news or an
> addition to a taxonomy) until nodes are updated, or maybe some queries
> become slightly or somewhat inaccurate until the update/reindex is complete.
>> 
>> So, you might want to have two stages of test system - one to just do a
> raw functional test of the changes, like whether your new synonyms work as
> expected or not, and then the pre-production stage which would be updated
> using exactly the same process as the production system, such as a rolling
> update or staged rolling update as required. The closer that pre-production
> system is run to the actual production, the greater the odds that you can
> have confidence that the update won't compromise the production system.
>> 
>> The pre-production test system might have, say, 10% of the production data
> and by only 10% the size of the production system.
>> 
>> In short, for smaller clusters having parallel systems with an atomic
> swap/redirection is probably simplest, while for larger clusters an
> incremental rolling update with thorough testing on a pre-production test
> cluster is the way to go.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message-----
>> From: Faisal Mansoor
>> Sent: Saturday, November 1, 2014 12:10 AM
>> To: solr-user@lucene.apache.org
>> Subject: How to update SOLR schema from continuous integration environment
>> 
>> Hi,
>> 
>> How do people usually update Solr configuration files from continuous
> integration environment like TeamCity or Jenkins.
>> 
>> We have multiple development and testing environments and use WebDeploy
> and AwsDeploy type of tools to remotely deploy code multiple times a day, to
> update solr I wrote a simple node server which accepts conf folder over
> http, updates the specified conf core folder and restarts the solr service.
>> 
>> Does there exists a standard tool for this uses case. I know about schema
> rest api, but, I want to update all the files in the conf folder rather than
> just updating a single file or adding or removing synonyms piecemeal.
>> 
>> Here is the link for the node server I mentioned if anyone is interested.
>> https://github.com/faisalmansoor/UpdateSolrConfig
>> 
>> 
>> Thanks,
>> Faisal 
>> 
>> 
> 
>

Re: How to update SOLR schema from continuous integration environment

Reply via email to