[ 
https://issues.apache.org/jira/browse/KUDU-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Fancher reassigned KUDU-2100:
------------------------------------

    Assignee: Edward Fancher

> Verify Java client's behavior for tserver and master fail-over scenario
> -----------------------------------------------------------------------
>
>                 Key: KUDU-2100
>                 URL: https://issues.apache.org/jira/browse/KUDU-2100
>             Project: Kudu
>          Issue Type: Test
>            Reporter: Alexey Serbin
>            Assignee: Edward Fancher
>
> This is to introduce a scenario where both the leader tserver and leader 
> master 'unexpectedly crash' during the run. The idea is to verify that the 
> client automatically updates its metacache even if the leader master changes 
> and manages to send the data to the destination server eventually.
> Mike suggested the following test scenario:
> # Have a configuration with 3 master servers, 6 tablet servers, and a table 
> consisting of 1 tablet with replication factor of 3.  Let's assume the tablet 
> are hosted by tablet servers TS1, TS2, and TS3.
> # Start the Kudu cluster.
> # Run the client to insert at least one row into the table.
> # Stop the client's activity, but keep the client object alive to keep it 
> ready for the next steps.
> # 3 times: permanently kill the leader of the tablet, so the tablet 
> eventually migrates to and is hosted by tablet servers TS4, TS5, TS6.
> # Kill the leader master (after the configuration change is committed).
> # Run the pre-warmed client to insert some data into the table again.  Doing 
> so, the client should refresh its metadata from the new leader master and be 
> able to send the data to the right destination.
> # Count the number of rows in the table to make sure it matches the 
> expectation.
> There was a discussion on when to kill the leader master: prior or after 
> moving the table to the new set of tablet servers.  It seems the latter case 
> (the sequence suggested above) allows covering a situation when no master 
> server recognizes itself as a leader.  The client should retry in that case 
> as well and eventually receive the tablet location info from the established 
> leader master.  If possible, let's implement the sequence for the former case 
> as well as an additional test.
> The general idea is to make sure the Java client during fail-over events:
> * Retries write and read operations automatically on an error happened due to 
> a fail-over event.
> * Does not silently lose any data: if the client cannot send the data due to 
> timeout or running out of retry attempts, it should report on that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to