Hi Ruchir,

With the large number of blocked flushes and the number of pending
compactions would still indicate IO contention. Can you post the output of
'iostat -x 5 5'

If you do in fact have spare IO, there are several configuration options
you can tune such as increasing the number of flush writers and
compaction_throughput_mb_per_sec

Mark


On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ruchir....@gmail.com> wrote:

> Also Mark to your comment on my tpstats output, below is my iostat output,
> and the iowait is at 4.59%, which means no IO pressure, but we are still
> seeing the bad flush performance. Should we try increasing the flush
> writers?
>
>
> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
>  _x86_64_        (24 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                   5.80   10.25    0.65    4.59    0.00   78.72
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             103.83      9630.62     11982.60 3231174328 4020290310
> dm-0             13.57       160.17        81.12   53739546   27217432
> dm-1              7.59        16.94        43.77    5682200   14686784
> dm-2           5792.76     32242.66     45427.12 10817753530 15241278360
> sdb             206.09     22789.19     33569.27 7646015080 11262843224
>
>
>
> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ruchir....@gmail.com> wrote:
>
>> nodetool status:
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address      Load       Tokens  Owns (effective)  Host ID
>>                   Rack
>> UN  10.10.20.27  1.89 TB    256     25.4%
>> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
>> UN  10.10.20.62  1.83 TB    256     25.5%
>> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
>> UN  10.10.20.47  1.87 TB    256     24.7%
>> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
>> UN  10.10.20.45  1.7 TB     256     22.6%
>> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
>> UN  10.10.20.15  1.86 TB    256     24.5%
>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
>> UN  10.10.20.31  1.87 TB    256     24.9%
>> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
>> UN  10.10.20.35  1.86 TB    256     25.8%
>> 17cb8772-2444-46ff-8525-33746514727d  rack1
>> UN  10.10.20.51  1.89 TB    256     25.0%
>> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
>> UN  10.10.20.19  1.91 TB    256     25.5%
>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
>> UN  10.10.20.39  1.93 TB    256     26.0%
>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
>> UN  10.10.20.52  1.81 TB    256     25.4%
>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
>> UN  10.10.20.22  1.89 TB    256     24.8%
>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
>>
>>
>> Note: The new node is not part of the above list.
>>
>> nodetool compactionstats:
>>
>> pending tasks: 1649
>>           compaction type        keyspace   column family       completed
>>           total      unit  progress
>>                Compaction           iprod   customerorder      1682804084
>>     17956558077     bytes     9.37%
>>                Compaction            prodgatecustomerorder
>>  1664239271      1693502275     bytes    98.27%
>>                Compaction  qa_config_bkupfixsessionconfig_hist
>>  2443           27253     bytes     8.96%
>>                Compaction            prodgatecustomerorder_hist
>>  1770577280      5026699390     bytes    35.22%
>>                Compaction           iprodgatecustomerorder_hist
>>  2959560205    312350192622     bytes     0.95%
>>
>>
>>
>>
>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <mark.re...@boxever.com>
>> wrote:
>>
>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>> including the new one.
>>>
>>>
>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>> commented out, this means you are using vnodes and the new node will
>>> automatically grab a list of tokens to take over responsibility for.
>>>
>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>> FlushWriter                       0         0           1136         0
>>>>               512
>>>>
>>>> Looks like about 50% of flushes are blocked.
>>>>
>>>
>>> This is a problem as it indicates that the IO system cannot keep up.
>>>
>>> Just ran this on the new node:
>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>> 10
>>>
>>>
>>> This is normal as the new node will most likely take tokens from all
>>> nodes in the cluster.
>>>
>>> Sorry for the multiple updates, but another thing I found was all the
>>>> other existing nodes have themselves in the seeds list, but the new node
>>>> does not have itself in the seeds list. Can that cause this issue?
>>>
>>>
>>> Seeds are only used when a new node is bootstrapping into the cluster
>>> and needs a set of ips to contact and discover the cluster, so this would
>>> have no impact on data sizes or streaming. In general it would be
>>> considered best practice to have a set of 2-3 seeds from each data center,
>>> with all nodes having the same seed list.
>>>
>>>
>>> What is the current output of 'nodetool compactionstats'? Could you also
>>> paste the output of nodetool status <keyspace>?
>>>
>>> Mark
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ruchir....@gmail.com> wrote:
>>>
>>>> Sorry for the multiple updates, but another thing I found was all the
>>>> other existing nodes have themselves in the seeds list, but the new node
>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ruchir....@gmail.com>
>>>> wrote:
>>>>
>>>>> Just ran this on the new node:
>>>>>
>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>> 10
>>>>>
>>>>> Seems like the new node is receiving data from 10 other nodes. Is that
>>>>> expected in a vnodes enabled environment?
>>>>>
>>>>> Ruchir.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ruchir....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>> tpstats output:
>>>>>>
>>>>>> Pool Name                    Active   Pending      Completed
>>>>>> Blocked  All time blocked
>>>>>> FlushWriter                       0         0           1136
>>>>>> 0               512
>>>>>>
>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ruchir....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>> including the new one.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <mark.re...@boxever.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My understanding was that if initial_token is left empty on the new
>>>>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>
>>>>>>>>
>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>>>> node will take token ranges dynamically. What is the configuration of 
>>>>>>>> your
>>>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ruchir....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>
>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>
>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>>>> 400) Writing Memtable
>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>
>>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was 
>>>>>>>>> that
>>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is 
>>>>>>>>> around
>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>> compaction is falling behind?
>>>>>>>>>
>>>>>>>>> Ruchir
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>> patri...@thelastpickle.com> wrote:
>>>>>>>>>
>>>>>>>>>> Ruchir,
>>>>>>>>>>
>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>
>>>>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>>>>> seed nodes per AZ / DC.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ruchir....@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The 
>>>>>>>>>>> bootstrap
>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on 
>>>>>>>>>>> the new
>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new 
>>>>>>>>>>> node is
>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>
>>>>>>>>>>> One thing that I noticed that seemed off was the seeds property
>>>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds
>>>>>>>>>>> property on the existing 12 nodes consists of all the other nodes 
>>>>>>>>>>> except
>>>>>>>>>>> the thirteenth node. Is this an issue?
>>>>>>>>>>>
>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>
>>>>>>>>>>> Ruchir.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Patricia Gorla
>>>>>>>>>> @patriciagorla
>>>>>>>>>>
>>>>>>>>>> Consultant
>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to