Re: Possible problem with disk latency

Ja Sam Wed, 25 Feb 2015 11:38:07 -0800

Hi,
One more thing. Hinted Handoff for last week for all nodes was less than 5.
For me every READ is a problem because it must open too many files (30000
SSTables), which occurs as an error in reads, repairs, etc.
Regards
Piotrek


On Wed, Feb 25, 2015 at 8:32 PM, Ja Sam <ptrstp...@gmail.com> wrote:

> Hi,
> It is not obvious, because data is replicated to second data center. We
> check it "manually" for random records we put into Cassandra and we find
> all of them in secondary DC.
> We know about every single GC failure, but this doesn't change anything.
> The problem with GC failure is only one: restart the node. For few days we
> do not have GC errors anymore. It looks for me like memory leaks.
> We use Chef.
>
> By MANUAL compaction you mean running nodetool compact?  What does it
> change to permanently running compactions?
>
> Regards
> Piotrek
>
> On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> I think you may have a vicious circle of errors: because your data is not
>> properly replicated to the neighbour, it is not replicating to the
>> secondary data center (yeah, obvious). I would suspect the GC errors are
>> (also obviously) the result of a backlog of compactions that take out the
>> neighbour (assuming replication of 3, that means each "neighbour" is
>> participating in compaction from at least one other node besides the
>> primary you are looking at (and can of course be much more, depending on
>> e.g. vnode count if used).
>>
>> What happens is that when a node fails due to a GC error (can't reclaim
>> space), that causes a cascade of other errors, as you see. Might I suggest
>> you have someone in devops with monitoring experience install a monitoring
>> tool that will notify you of EVERY SINGLE java GC failure event? Your
>> DevOps team may have a favorite log shipping/monitoring tool, could use
>> e.g. Puppet
>>
>> I think you may have to go through a MANUAL, table by table compaction.
>>
>>
>>
>>
>>
>> *.......*
>>
>>
>>
>>
>>
>>
>> *“Life should not be a journey to the grave with the intention of
>> arriving safely in apretty and well preserved body, but rather to skid in
>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam <ptrstp...@gmail.com> wrote:
>>
>>> Hi Roni,
>>> The repair results is following (we run it Friday): Cannot proceed on
>>> repair because a neighbor (/192.168.61.201) is dead: session failed
>>>
>>> But to be honest the neighbor did not died. It seemed to trigger a
>>> series of full GC events on the initiating node. The results form logs
>>> are:
>>>
>>> [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
>>> for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
>>> [2015-02-21 02:21:55,640] Lost notification. You should check server log
>>> for repair status of keyspace prem_maelstrom_2
>>> [2015-02-21 02:22:55,642] Lost notification. You should check server log
>>> for repair status of keyspace prem_maelstrom_2
>>> [2015-02-21 02:23:55,642] Lost notification. You should check server log
>>> for repair status of keyspace prem_maelstrom_2
>>> [2015-02-21 02:24:55,644] Lost notification. You should check server log
>>> for repair status of keyspace prem_maelstrom_2
>>> [2015-02-21 04:41:08,607] Repair session
>>> d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
>>> (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
>>> failed with error org.apache.cassandra.exceptions.RepairException: [repair
>>> #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
>>> (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
>>> Sync failed between /192.168.71.196 and /192.168.61.199
>>> [2015-02-21 04:41:08,608] Repair session
>>> eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
>>> (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
>>> failed with error java.io.IOException: Endpoint /192.168.61.199 died
>>> [2015-02-21 04:41:08,608] Repair session
>>> c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
>>> java.io.IOException: Cannot proceed on repair because a neighbor (/
>>> 192.168.61.201) is dead: session failed
>>> [2015-02-21 04:41:08,609] Repair session
>>> c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
>>> (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
>>> failed with error java.io.IOException: Cannot proceed on repair because a
>>> neighbor (/192.168.61.201) is dead: session failed
>>> [2015-02-21 04:41:08,609] Repair session
>>> c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
>>> (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
>>> failed with error java.io.IOException: Cannot proceed on repair because a
>>> neighbor (/192.168.61.201) is dead: session failed
>>> [2015-02-21 04:41:08,619] Repair session
>>> c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
>>> (136112946768375392941136215564139626496,0] failed with error
>>> java.io.IOException: Cannot proceed on repair because a neighbor (/
>>> 192.168.61.201) is dead: session failed
>>> [2015-02-21 04:41:08,620] Repair session
>>> c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
>>> (102084710076281535261119195933814292480,127605887595351923798765477786913079306]
>>> failed with error java.io.IOException: Cannot proceed on repair because a
>>> neighbor (/192.168.61.201) is dead: session failed
>>> [2015-02-21 04:41:08,620] Repair command #2 finished
>>>
>>>
>>> We tried to run repair one more time. After 24 hour have some streaming
>>> errors. Moreover, 2-3 hours later, we have to stop it because we start to
>>> have write timeouts on client and our system starts to dying.
>>> The iostats from "dying" time plus tpstats are available here:
>>> https://drive.google.com/file/d/0B4N_AbBPGGwLc25nU0lnY3Z5NDA/view
>>>
>>>
>>>
>>> On Wed, Feb 25, 2015 at 7:50 PM, Roni Balthazar <ronibaltha...@gmail.com
>>> > wrote:
>>>
>>>> Hi Piotr,
>>>>
>>>> Are your repairs finishing without errors?
>>>>
>>>> Regards,
>>>>
>>>> Roni Balthazar
>>>>
>>>> On 25 February 2015 at 15:43, Ja Sam <ptrstp...@gmail.com> wrote:
>>>> > Hi, Roni,
>>>> > They aren't exactly balanced but as I wrote before they are in range
>>>> from
>>>> > 2500-6000.
>>>> > If you need exactly data I will check them tomorrow morning. But all
>>>> nodes
>>>> > in AGRAF have small increase of pending compactions during last week,
>>>> which
>>>> > is "wrong direction"
>>>> >
>>>> > I will check in the morning get compaction throuput, but my feeling
>>>> about
>>>> > this parameter is that it doesn't change anything.
>>>> >
>>>> > Regards
>>>> > Piotr
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar <
>>>> ronibaltha...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Hi Piotr,
>>>> >>
>>>> >> What about the nodes on AGRAF? Are the pending tasks balanced between
>>>> >> this DC nodes as well?
>>>> >> You can check the pending compactions on each node.
>>>> >>
>>>> >> Also try to run "nodetool getcompactionthroughput" on all nodes and
>>>> >> check if the compaction throughput is set to 999.
>>>> >>
>>>> >> Cheers,
>>>> >>
>>>> >> Roni Balthazar
>>>> >>
>>>> >> On 25 February 2015 at 14:47, Ja Sam <ptrstp...@gmail.com> wrote:
>>>> >> > Hi Roni,
>>>> >> >
>>>> >> > It is not balanced. As I wrote you last week I have problems only
>>>> in DC
>>>> >> > in
>>>> >> > which we writes (on screen it is named as AGRAF:
>>>> >> > https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view).
>>>> The
>>>> >> > problem is on ALL nodes in this dc.
>>>> >> > In second DC (ZETO) only one node have more than 30 SSTables and
>>>> pending
>>>> >> > compactions are decreasing to zero.
>>>> >> >
>>>> >> > In AGRAF the minimum pending compaction is 2500 , maximum is 6000
>>>> (avg
>>>> >> > on
>>>> >> > screen from opscenter is less then 5000)
>>>> >> >
>>>> >> >
>>>> >> > Regards
>>>> >> > Piotrek.
>>>> >> >
>>>> >> > p.s. I don't know why my mail client display my name as Ja Sam
>>>> instead
>>>> >> > of
>>>> >> > Piotr Stapp, but this doesn't change anything :)
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar
>>>> >> > <ronibaltha...@gmail.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Hi Ja,
>>>> >> >>
>>>> >> >> How are the pending compactions distributed between the nodes?
>>>> >> >> Run "nodetool compactionstats" on all of your nodes and check if
>>>> the
>>>> >> >> pendings tasks are balanced or they are concentrated in only few
>>>> >> >> nodes.
>>>> >> >> You also can check the if the SSTable count is balanced running
>>>> >> >> "nodetool cfstats" on your nodes.
>>>> >> >>
>>>> >> >> Cheers,
>>>> >> >>
>>>> >> >> Roni Balthazar
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> On 25 February 2015 at 13:29, Ja Sam <ptrstp...@gmail.com> wrote:
>>>> >> >> > I do NOT have SSD. I have normal HDD group by JBOD.
>>>> >> >> > My CF have SizeTieredCompactionStrategy
>>>> >> >> > I am using local quorum for reads and writes. To be precise I
>>>> have a
>>>> >> >> > lot
>>>> >> >> > of
>>>> >> >> > writes and almost 0 reads.
>>>> >> >> > I changed "cold_reads_to_omit" to 0.0 as someone suggest me. I
>>>> used
>>>> >> >> > set
>>>> >> >> > compactionthrouput to 999.
>>>> >> >> >
>>>> >> >> > So if my disk are idle, my CPU is less then 40%, I have some
>>>> free RAM
>>>> >> >> > -
>>>> >> >> > why
>>>> >> >> > SSTables count is growing? How I can speed up compactions?
>>>> >> >> >
>>>> >> >> > On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall <
>>>> n...@thelastpickle.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>> If You could be so kind and validate above and give me an
>>>> answer is
>>>> >> >> >>> my
>>>> >> >> >>> disk are real problems or not? And give me a tip what should
>>>> I do
>>>> >> >> >>> with
>>>> >> >> >>> above
>>>> >> >> >>> cluster? Maybe I have misconfiguration?
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >>
>>>> >> >> >> You disks are effectively idle. What consistency level are you
>>>> using
>>>> >> >> >> for
>>>> >> >> >> reads and writes?
>>>> >> >> >>
>>>> >> >> >> Actually, 'await' is sort of weirdly high for idle SSDs. Check
>>>> your
>>>> >> >> >> interrupt mappings (cat /proc/interrupts) and make sure the
>>>> >> >> >> interrupts
>>>> >> >> >> are
>>>> >> >> >> not being stacked on a single CPU.
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >
>>>> >> >
>>>> >> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Possible problem with disk latency

Reply via email to