[ https://issues.apache.org/jira/browse/KUDU-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edward Fancher reassigned KUDU-2100: ------------------------------------ Assignee: Edward Fancher > Verify Java client's behavior for tserver and master fail-over scenario > ----------------------------------------------------------------------- > > Key: KUDU-2100 > URL: https://issues.apache.org/jira/browse/KUDU-2100 > Project: Kudu > Issue Type: Test > Reporter: Alexey Serbin > Assignee: Edward Fancher > > This is to introduce a scenario where both the leader tserver and leader > master 'unexpectedly crash' during the run. The idea is to verify that the > client automatically updates its metacache even if the leader master changes > and manages to send the data to the destination server eventually. > Mike suggested the following test scenario: > # Have a configuration with 3 master servers, 6 tablet servers, and a table > consisting of 1 tablet with replication factor of 3. Let's assume the tablet > are hosted by tablet servers TS1, TS2, and TS3. > # Start the Kudu cluster. > # Run the client to insert at least one row into the table. > # Stop the client's activity, but keep the client object alive to keep it > ready for the next steps. > # 3 times: permanently kill the leader of the tablet, so the tablet > eventually migrates to and is hosted by tablet servers TS4, TS5, TS6. > # Kill the leader master (after the configuration change is committed). > # Run the pre-warmed client to insert some data into the table again. Doing > so, the client should refresh its metadata from the new leader master and be > able to send the data to the right destination. > # Count the number of rows in the table to make sure it matches the > expectation. > There was a discussion on when to kill the leader master: prior or after > moving the table to the new set of tablet servers. It seems the latter case > (the sequence suggested above) allows covering a situation when no master > server recognizes itself as a leader. The client should retry in that case > as well and eventually receive the tablet location info from the established > leader master. If possible, let's implement the sequence for the former case > as well as an additional test. > The general idea is to make sure the Java client during fail-over events: > * Retries write and read operations automatically on an error happened due to > a fail-over event. > * Does not silently lose any data: if the client cannot send the data due to > timeout or running out of retry attempts, it should report on that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)