Also Mark to your comment on my tpstats output, below is my iostat output, and the iowait is at 4.59%, which means no IO pressure, but we are still seeing the bad flush performance. Should we try increasing the flush writers?
Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014 _x86_64_ (24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.80 10.25 0.65 4.59 0.00 78.72 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 103.83 9630.62 11982.60 3231174328 4020290310 dm-0 13.57 160.17 81.12 53739546 27217432 dm-1 7.59 16.94 43.77 5682200 14686784 dm-2 5792.76 32242.66 45427.12 10817753530 15241278360 sdb 206.09 22789.19 33569.27 7646015080 11262843224 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ruchir....@gmail.com> wrote: > nodetool status: > > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.10.20.27 1.89 TB 256 25.4% > 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1 > UN 10.10.20.62 1.83 TB 256 25.5% > 84b47313-da75-4519-94f3-3951d554a3e5 rack1 > UN 10.10.20.47 1.87 TB 256 24.7% > bcd51a92-3150-41ae-9c51-104ea154f6fa rack1 > UN 10.10.20.45 1.7 TB 256 22.6% > 8d6bce33-8179-4660-8443-2cf822074ca4 rack1 > UN 10.10.20.15 1.86 TB 256 24.5% > 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1 > UN 10.10.20.31 1.87 TB 256 24.9% > 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1 > UN 10.10.20.35 1.86 TB 256 25.8% > 17cb8772-2444-46ff-8525-33746514727d rack1 > UN 10.10.20.51 1.89 TB 256 25.0% > 0343cd58-3686-465f-8280-56fb72d161e2 rack1 > UN 10.10.20.19 1.91 TB 256 25.5% > 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1 > UN 10.10.20.39 1.93 TB 256 26.0% > b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1 > UN 10.10.20.52 1.81 TB 256 25.4% > 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1 > UN 10.10.20.22 1.89 TB 256 24.8% > 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1 > > > Note: The new node is not part of the above list. > > nodetool compactionstats: > > pending tasks: 1649 > compaction type keyspace column family completed > total unit progress > Compaction iprod customerorder 1682804084 > 17956558077 bytes 9.37% > Compaction prodgatecustomerorder 1664239271 > 1693502275 bytes 98.27% > Compaction qa_config_bkupfixsessionconfig_hist > 2443 27253 bytes 8.96% > Compaction prodgatecustomerorder_hist > 1770577280 5026699390 bytes 35.22% > Compaction iprodgatecustomerorder_hist > 2959560205 312350192622 bytes 0.95% > > > > > On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <mark.re...@boxever.com> > wrote: > >> Yes num_tokens is set to 256. initial_token is blank on all nodes >>> including the new one. >> >> >> Ok so you have num_tokens set to 256 for all nodes with initial_token >> commented out, this means you are using vnodes and the new node will >> automatically grab a list of tokens to take over responsibility for. >> >> Pool Name Active Pending Completed Blocked >>> All time blocked >>> FlushWriter 0 0 1136 0 >>> 512 >>> >>> Looks like about 50% of flushes are blocked. >>> >> >> This is a problem as it indicates that the IO system cannot keep up. >> >> Just ran this on the new node: >>> nodetool netstats | grep "Streaming from" | wc -l >>> 10 >> >> >> This is normal as the new node will most likely take tokens from all >> nodes in the cluster. >> >> Sorry for the multiple updates, but another thing I found was all the >>> other existing nodes have themselves in the seeds list, but the new node >>> does not have itself in the seeds list. Can that cause this issue? >> >> >> Seeds are only used when a new node is bootstrapping into the cluster and >> needs a set of ips to contact and discover the cluster, so this would have >> no impact on data sizes or streaming. In general it would be considered >> best practice to have a set of 2-3 seeds from each data center, with all >> nodes having the same seed list. >> >> >> What is the current output of 'nodetool compactionstats'? Could you also >> paste the output of nodetool status <keyspace>? >> >> Mark >> >> >> >> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ruchir....@gmail.com> wrote: >> >>> Sorry for the multiple updates, but another thing I found was all the >>> other existing nodes have themselves in the seeds list, but the new node >>> does not have itself in the seeds list. Can that cause this issue? >>> >>> >>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ruchir....@gmail.com> >>> wrote: >>> >>>> Just ran this on the new node: >>>> >>>> nodetool netstats | grep "Streaming from" | wc -l >>>> 10 >>>> >>>> Seems like the new node is receiving data from 10 other nodes. Is that >>>> expected in a vnodes enabled environment? >>>> >>>> Ruchir. >>>> >>>> >>>> >>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ruchir....@gmail.com> >>>> wrote: >>>> >>>>> Also not sure if this is relevant but just noticed the nodetool >>>>> tpstats output: >>>>> >>>>> Pool Name Active Pending Completed Blocked >>>>> All time blocked >>>>> FlushWriter 0 0 1136 0 >>>>> 512 >>>>> >>>>> Looks like about 50% of flushes are blocked. >>>>> >>>>> >>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ruchir....@gmail.com> >>>>> wrote: >>>>> >>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes >>>>>> including the new one. >>>>>> >>>>>> >>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <mark.re...@boxever.com> >>>>>> wrote: >>>>>> >>>>>>> My understanding was that if initial_token is left empty on the new >>>>>>>> node, it just contacts the heaviest node and bisects its token range. >>>>>>> >>>>>>> >>>>>>> If you are using vnodes and you have num_tokens set to 256 the new >>>>>>> node will take token ranges dynamically. What is the configuration of >>>>>>> your >>>>>>> other nodes, are you setting num_tokens or initial_token on those? >>>>>>> >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ruchir....@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Patricia for your response! >>>>>>>> >>>>>>>> On the new node, I just see a lot of the following: >>>>>>>> >>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line >>>>>>>> 400) Writing Memtable >>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 >>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to >>>>>>>> >>>>>>>> so basically it is just busy flushing, and compacting. Would you >>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was >>>>>>>> that >>>>>>>> if initial_token is left empty on the new node, it just contacts the >>>>>>>> heaviest node and bisects its token range. And the heaviest node is >>>>>>>> around >>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because >>>>>>>> compaction is falling behind? >>>>>>>> >>>>>>>> Ruchir >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla < >>>>>>>> patri...@thelastpickle.com> wrote: >>>>>>>> >>>>>>>>> Ruchir, >>>>>>>>> >>>>>>>>> What exactly are you seeing in the logs? Are you running major >>>>>>>>> compactions on the new bootstrapping node? >>>>>>>>> >>>>>>>>> With respect to the seed list, it is generally advisable to use 3 >>>>>>>>> seed nodes per AZ / DC. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ruchir....@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster >>>>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap >>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the >>>>>>>>>> new >>>>>>>>>> node is already above 4 TB and still going. Is this because the new >>>>>>>>>> node is >>>>>>>>>> running major compactions while the streaming is going on? >>>>>>>>>> >>>>>>>>>> One thing that I noticed that seemed off was the seeds property >>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds >>>>>>>>>> property on the existing 12 nodes consists of all the other nodes >>>>>>>>>> except >>>>>>>>>> the thirteenth node. Is this an issue? >>>>>>>>>> >>>>>>>>>> Any other insight is appreciated? >>>>>>>>>> >>>>>>>>>> Ruchir. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Patricia Gorla >>>>>>>>> @patriciagorla >>>>>>>>> >>>>>>>>> Consultant >>>>>>>>> Apache Cassandra Consulting >>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >