read workloads (up to 100s downtime) in case of a Cassandra node failure

Daniel Seybold Fri, 23 Nov 2018 01:21:56 -0800

Hi Alexander,

thanks a lot for the pointers, I checked the mentioned issue.

While the reported issue seems to match our problem it only occurs readsand not for writes (according to the Datastax Jira). But we experiencedowntimes for writes and reads.

Which version of the Datastax Driver are you using for your tests?

We use version 3.0.0

But I have also tried version 3.2.0 to avoid your mentioned JAVA-1346issue, but still the same behaviour with respect to the downtime.

How is it configured (load balancing policies, etc...) ?

Besides the write consistency of ONE it uses the default settings.

As we use the YCSB as workload for our experiments, you can have a lookat the driver settings in the basic class:https://github.com/brianfrankcooper/YCSB/blob/master/cassandra/src/main/java/com/yahoo/ycsb/db/CassandraCQLClient.java

Do you have some debug logs on the client side that could help?

On client side the logs shows no exceptions or any suspicious messages.

I also turned on the tracing but didn't find any suspicious messages(yet I did not spend too much time in that and I am no expert theCassandra Driver)

If more detailed logs or the traces would help to further investigatethe issue let me know and I will rerun the experiments to create thelogs and traces.


Many thanks again for your help.

Cheers,

Daniel


Am 16.11.2018 um 15:08 schrieb Alexander Dejanovski:

Hi Daniel,

it seems like the driver isn't detecting that the node went down,which is probably due to the way the node is being killed.If I remember correctly, in some cases Netty transport is still up inthe client, which will still allows to send queries without themanswering back : https://datastax-oss.atlassian.net/browse/JAVA-1346

Eventually, the node gets discarded when the heartbeat system catches up.

It's also possible that the stuck queries then eat up all theavailable slots in the driver, preventing any other query to be sentin that JVM.


Which version of the Datastax Driver are you using for your tests?
How is it configured (load balancing policies, etc...) ?
Do you have some debug logs on the client side that could help?

Thanks,

On Fri, Nov 16, 2018 at 1:19 PM Daniel Seybold<daniel.seyb...@uni-ulm.de <mailto:daniel.seyb...@uni-ulm.de>> wrote:


    Hi Sean,

    thanks for your comments, find below some more details with
    respect to the (1) VM sizing and (2) the replication factor:

    (1) VM sizing:

    We selected the small VMs as intial setup to run our experiments.
    We have also executed the same experiments (5 nodes) on larger VMs
    with 6 cores and 12GB memory (where 6GB was allocated to Cassandra).

    We use the default CMS garbace collector (with default settings)
    and the debug.log and system.log does not show any suspicious GC
    messages.

    (2) Replication factor

    We set the RF to 5 as we want to emulate a scenario which is able
    to survive multiple-node failures. We have also tried a RF of 3
    (in the 5 node cluster) but the downtime in case of a node failure
    persists.


    I also attached two plots which show the results with the
    downtimes for using the larger VMs and setting the RF to 3

    Any further comments much appreciated,

    Cheers,
    Daniel


    Am 09.11.2018 um 19:04 schrieb Durity, Sean R:


    The VMs’ memory (4 GB) seems pretty small for Cassandra. What
    heap size are you using? Which garbage collector? Are you seeing
    long GC times on the nodes? The basic rule of thumb is to give
    the Cassandra heap 50% of the RAM on the host. 2 GB isn’t very much.

    Also, I wouldn’t set the replication factor to 5 (the number of
    nodes). If RF is always equal to the number of nodes, you can’t
    really scale beyond the size of the disk on any one node (all
    data is on each node). A replication factor of 3 would be more
    like a typical production set-up.

    Sean Durity

    *From:*Daniel Seybold <daniel.seyb...@uni-ulm.de>
    <mailto:daniel.seyb...@uni-ulm.de>
    *Sent:* Friday, November 09, 2018 5:49 AM
    *To:* user@cassandra.apache.org <mailto:user@cassandra.apache.org>
    *Subject:* [EXTERNAL] Availability issues for write/update/read
    workloads (up to 100s downtime) in case of a Cassandra node failure

    Hi Apache Cassandra experts,

    we are running a set of availability evaluations under a
    write/read/update workloads with Apache Cassandra and experience
    some unexpected results, i.e.  0 ops/s over a period up to 100s.

    In order to provide a clear picture find below the details of (1)
    the setup and (2) the evaluation workflow

    *1. Setup:*

    Cassandra version: 3.11.2
    Cluster size: 5 nodes
    Replication Factor: 5
    Each nodes runs in the same private OpenStack based cloud, within
    the same availability zone and uses the private network.
    Each nodes runs as OS Ubuntu 16.04 server and has 2 cores, 4GB
    RAM and 50GB disk.

    Workload:
    Yahoo Cloud Serving Benchmark 0.12
    W1: 100% write
    W2: 100% read
    W3: 100% update

    *2. Evaluation Workflow: *

    1. allocate 5 VMs & deploy DBMS cluster
    2. start a YCSB worklod (only one of W1-3) which runs up to 30
    minutes
    3. wait for 200s
    4. trigger the selection of a  random node in the cluster and
    delete the VM without stopping Cassandra before
    5. analyze throughput time series over the evaluation

    *3. (Unexpected) Results

    *We expected to see a (slight) drop in the throughput as soon as
    the VM was deleted.
    But the throughput results show that the there are periods of
    ~10s - 150s (not deterministic) where no operations are executed
    (all metrics are collected on client side)
    Yet, there are no timeout exceptions on client side and also the
    logs on cluster side do not show anything that explains this
    behaviour.

    I attached a series of plots which show the throughput and the
    downtimes over the evaluation runs.

    Do you have any explanations for this behaviour or
    recommendations how to reduce the  potential "downtime" ?

    Thanks in advance for any help and recommendations,

    Cheers,
    Daniel

--M.Sc. Daniel SeyboldUniversität Ulm

    Institut Organisation und Management
    von Informationssystemen (OMI)
    Albert-Einstein-Allee 43  
<https://maps.google.com/?q=Albert-Einstein-Allee+43+%0D%0A+++++++++++89081+Ulm&entry=gmail&source=g>
    
<https://maps.google.com/?q=Albert-Einstein-Allee+43+%0D%0A+++++++++++89081+Ulm&entry=gmail&source=g>

    89081 Ulm  
<https://maps.google.com/?q=Albert-Einstein-Allee+43+%0D%0A+++++++++++89081+Ulm&entry=gmail&source=g>
    Phone:+49 (0)731 50-28 799  <tel:+49%20731%205028799>

    ------------------------------------------------------------------------

    The information in this Internet Email is confidential and may be
    legally privileged. It is intended solely for the addressee.
    Access to this Email by anyone else is unauthorized. If you are
    not the intended recipient, any disclosure, copying, distribution
    or any action taken or omitted to be taken in reliance on it, is
    prohibited and may be unlawful. When addressed to our clients any
    opinions or advice contained in this Email are subject to the
    terms and conditions expressed in any applicable governing The
    Home Depot terms of business or client engagement letter. The
    Home Depot disclaims all responsibility and liability for the
    accuracy and content of this attachment and for any damages or
    losses arising from any inaccuracies, errors, viruses, e.g.,
    worms, trojan horses, etc., or other items of a destructive
    nature, which may be contained in this attachment and shall not
    be liable for direct, indirect, consequential or special damages
    in connection with this e-mail message or its attachment.

--M.Sc. Daniel Seybold


    Universität Ulm
    Institut Organisation und Management
    von Informationssystemen (OMI)
    Albert-Einstein-Allee 43 89081 Ulm  
<https://maps.google.com/?q=Albert-Einstein-Allee+43%0D%0A89081+Ulm&entry=gmail&source=g>
    Phone:+49 (0)731 50-28 799  <tel:+49%20731%205028799>

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
    <mailto:user-unsubscr...@cassandra.apache.org>
    For additional commands, e-mail: user-h...@cassandra.apache.org
    <mailto:user-h...@cassandra.apache.org>

--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://www.thelastpickle.com/>


--
M.Sc. Daniel Seybold

Universität Ulm
Institut Organisation und Management
von Informationssystemen (OMI)
Albert-Einstein-Allee 43
89081 Ulm
Phone: +49 (0)731 50-28 799

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

Reply via email to