Re: Long joining node

Stan Lemon Thu, 06 Aug 2015 05:22:48 -0700

It was suggested to me that I try running scrub on the other nodes in the
cluster, as the runtime exceptions I was seeing might be relevant to some
bad data. I am going to try that this morning and see how things go. Not
sure how long is long enough for nodetool scrub to run on a box though.


As for the load... Here's the spread on the current cluster:

[stan.lemon@cass-d101 ~]$ nodetool status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: DALLAS
==================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns   Host ID
      Rack
UN  10.86.123.86  276.19 GB  256     4.0%
2386b94b-fe99-4cb0-8053-321c0540db45  RAC1
UN  10.81.122.66  261.38 GB  256     4.4%
b4533802-83c3-4e57-bbea-6b63294ba377  RAC1
UN  10.81.122.64  266.85 GB  256     4.3%
391a6dfc-254a-43cf-8f25-5518e8ab6511  RAC1
UN  10.86.123.84  290.27 GB  256     4.2%
14979aeb-e0a8-4f7d-866e-0e701a4f774f  RAC1
UN  10.86.123.82  289.96 GB  256     4.5%
65df8d81-0ec1-4f67-81c1-06e86e48593a  RAC1
UN  10.86.123.80  290.81 GB  256     4.4%
c4276398-0c76-4802-b92e-e08a3a0e319f  RAC1
UN  10.84.78.120  290.74 GB  256     4.5%
fce37c3d-c142-40b5-978c-ab8e59939b2f  RAC1
UN  10.84.78.118  287.85 GB  256     4.3%
cfd64c76-fb08-4a3a-b88e-bc19c45115c6  RAC1
UN  10.86.123.78  290.96 GB  256     4.1%
32cc866f-7b5f-4310-ac4a-e0f5dd650b78  RAC1
UN  10.86.123.76  295.52 GB  256     4.1%
bb1b80ba-28bf-4a39-9623-16e326eaaf09  RAC1
UN  10.81.122.62  286.81 GB  256     4.1%
ef255fd1-beee-4dc0-80f5-9ae2271c6398  RAC1
UN  10.86.123.74  303.25 GB  256     4.3%
041d7ab7-d1bd-4a79-afb7-9c6ab1857ee9  RAC1
Datacenter: SEATTLE
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns   Host ID
      Rack
UN  10.29.82.80   297.11 GB  256     4.3%
a0e61c1e-e48f-4ccd-afa4-5069d5671382  RAC1
UN  10.29.82.156  304.74 GB  256     4.3%
d17abc57-eb47-41de-8cd5-a341a38b16de  RAC1
UN  10.29.82.158  289.63 GB  256     4.4%
f47d4019-7fd9-4620-9465-d1199311de36  RAC1
UN  10.29.82.152  285.99 GB  256     4.1%
23ee0c6f-5ac7-475a-be13-7d0536619da3  RAC1
UN  10.29.82.168  285.39 GB  256     3.8%
f5f2f55c-e316-4281-b472-f572601c7618  RAC1
UN  10.29.82.154  287.8 GB   256     4.0%
29cd9781-985a-49ed-9910-46279f50bbba  RAC1
UN  10.29.82.166  282.9 GB   256     4.1%
627b0a9e-c0d0-4a90-9cbe-22f7fbb81f9f  RAC1
UN  10.29.82.148  291.17 GB  256     4.0%
c52b467f-8960-4c4f-951a-b4232bbd25ee  RAC1
UN  10.29.82.164  269.74 GB  256     3.9%
7fba7779-c705-45bb-a0ae-26a5dff93374  RAC1
UN  10.29.82.150  281.93 GB  256     4.1%
63165266-bfda-4bd5-b339-e103546bb853  RAC1
UN  10.29.82.162  294.11 GB  256     3.9%
933a495f-4ed7-4bf9-97d7-2ce2c58f5200  RAC1
UN  10.29.82.160  261.22 GB  256     4.0%
7baaeb81-b46b-441a-bb29-914247ec3fac  RAC1

On Wed, Aug 5, 2015 at 9:54 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> What's your average data per node? Is 230gb close?
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Wed, Aug 5, 2015 at 8:33 AM, Stan Lemon <sle...@salesforce.com> wrote:
>
>> I set the stream timeout to 1 hour this morning and started fresh trying
>> to join this node.  It took about an hour to stream over 230gb of data, and
>> then into hour 2 I wound up back where I was yesterday, the node's load is
>> slowly reducing and the netstats does not show sending or receiving
>> anything.  I'm not sure how long I should wait before I throw the towel in
>> on this attempt. I'm also not really sure what to try next...
>>
>> The only thing in the logs currently are three entries like this:
>>
>> ERROR 07:39:44,447 Exception in thread
>> Thread[CompactionExecutor:31,1,main]
>> java.lang.RuntimeException: Last written key
>> DecoratedKey(8633837336094175369,
>> 003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030663264632d303030302d303033302d343030302d3030303030303030663264633a66376436366166382d383564352d313165342d383030302d30303030303035343764623600)
>> >= current key DecoratedKey(-6568345298384940765,
>> 003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030376464652d303030302d303033302d343030302d3030303030303030376464653a64633930336533382d643766342d313165342d383030302d30303030303730626338386300)
>> writing into
>> /var/lib/cassandra/data/pi/__shardindex/pi-__shardindex-tmp-jb-644-Data.db
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
>> at
>> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:170)
>> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> at
>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>> at
>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>> at
>> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> ANY help is greatly appreciated.
>>
>> Thanks,
>> Stan
>>
>>
>>
>>
>>
>> On Tue, Aug 4, 2015 at 2:23 PM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> That's the one. I set it to an hour to be safe (if a stream goes above
>>> the timeout it will get restarted) but it can probably be lower.
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>>
>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Aug 4, 2015 at 2:21 PM, Stan Lemon <sle...@salesforce.com>
>>> wrote:
>>>
>>>> Sebastian,
>>>> You're referring to streaming_socket_timeout_in_ms correct?  What value
>>>> do you recommend?  All of my nodes are currently at the default 0.
>>>>
>>>> Thanks,
>>>> Stan
>>>>
>>>>
>>>> On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez <
>>>> sebastian.este...@datastax.com> wrote:
>>>>
>>>>> It helps to set stream socket timeout in the yaml so that you don't
>>>>> hang forever on a lost / broken stream.
>>>>>
>>>>> All the best,
>>>>>
>>>>>
>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>
>>>>> Sebastián Estévez
>>>>>
>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>>>
>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>>> <https://twitter.com/datastax> [image: g+.png]
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax>
>>>>>
>>>>>
>>>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>>>
>>>>> DataStax is the fastest, most scalable distributed database
>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>> DataStax
>>>>> is the database technology and transactional backbone of choice for the
>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>>
>>>>> On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sle...@salesforce.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am attempting to add a 13th node in one of the datacenters. I have
>>>>>>> been monitoring this process from the node itself with nodetool netstats
>>>>>>> and from one of the existing nodes using nodetool status.
>>>>>>>
>>>>>>> On the existing node I see the new node as UJ. I have watched the
>>>>>>> load steadily climb up to about 203.4gb, and then over the last two 
>>>>>>> hours
>>>>>>> it has fluctuated a bit and has been steadily dropping to about 203.1gb
>>>>>>>
>>>>>>
>>>>>> It's probably hung. If I were you I'd probably wipe the node and
>>>>>> re-bootstrap.
>>>>>>
>>>>>> (what version of cassandra/what network are you on (AWS?)/etc.)
>>>>>>
>>>>>> =Rob
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Long joining node

Reply via email to