[jira] [Comment Edited] (KUDU-3271) Tablet server crashed when handle scan request

2021-04-06 Thread YifanZhang (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315447#comment-17315447
 ] 

YifanZhang edited comment on KUDU-3271 at 4/6/21, 11:34 AM:


[~awong] I have attached the INFO log of that day related to the being scanned 
tablet. It was about 16:34 when the tablet server crashed. At that time a user 
executed the query `select count(1) from xxx`.

An application deletes all records from this table and reloads new data every 
day. But we failed to reporduce this problem by executing the same query 
today.:(  

We set tserver flag `​–tablet_history_max_age_sec=10` because users don't 
usually need to read historical data.

 


was (Author: zhangyifan27):
[~awong] I have attached the INFO log of that day related to the being scanned 
tablet. It was about 16:34 when the tablet server crashed. At that time a user 
executed the query `select count(1) from xxx`. An application deletes all 
records from this table and reloads new data every day. But we failed to 
reporduce this problem by executing the same query today.

 

> Tablet server crashed when handle scan request
> --
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: YifanZhang
>Priority: Major
> Attachments: tablet-52a743.log
>
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x02185900 in kudu::UnionIterator::HasNext (this=) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, 
> __in_chrg=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4  kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880, 
> rpc_context=rpc_context@entry=0x8151d7800,     
> result_collector=result_collector@entry=0x7f2d721679f0, 
> has_more_results=has_more_results@entry=0x7f2d721678f9, 
> error_code=error_code@entry=0x7f2d721678fc)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5  0x009fb009 in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6  0x0210f019 in operator() (__args#2=0x8151d7800, 
> __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at 
> /usr/include/c++/4.8.2/functional:2471
> #7  kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8  0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9  0x0228ecaf in operator() (this=0xc1a58c28) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 
> 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 
> 0x7f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3271) Tablet server crashed when handle scan request

2021-04-06 Thread YifanZhang (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315447#comment-17315447
 ] 

YifanZhang edited comment on KUDU-3271 at 4/6/21, 11:21 AM:


[~awong] I have attached the INFO log of that day related to the being scanned 
tablet. It was about 16:34 when the tablet server crashed. At that time a user 
executed the query `select count(1) from xxx`. An application deletes all 
records from this table and reloads new data every day. But we failed to 
reporduce this problem by executing the same query today.

 


was (Author: zhangyifan27):
[~awong] I have attached the INFO log of that day related to the being scanned 
tablet. It was about 16:34 when the tablet server crashed. At that time a user 
executed the query `select count(1) from xxx`. But we failed to reporduce this 
problem by executing the same query today.

> Tablet server crashed when handle scan request
> --
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: YifanZhang
>Priority: Major
> Attachments: tablet-52a743.log
>
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x02185900 in kudu::UnionIterator::HasNext (this=) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, 
> __in_chrg=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4  kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880, 
> rpc_context=rpc_context@entry=0x8151d7800,     
> result_collector=result_collector@entry=0x7f2d721679f0, 
> has_more_results=has_more_results@entry=0x7f2d721678f9, 
> error_code=error_code@entry=0x7f2d721678fc)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5  0x009fb009 in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6  0x0210f019 in operator() (__args#2=0x8151d7800, 
> __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at 
> /usr/include/c++/4.8.2/functional:2471
> #7  kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8  0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9  0x0228ecaf in operator() (this=0xc1a58c28) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 
> 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 
> 0x7f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3271) Tablet server crashed when handle scan request

2021-04-02 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314068#comment-17314068
 ] 

Andrew Wong edited comment on KUDU-3271 at 4/2/21, 8:20 PM:


[~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs 
that you think might be useful in getting to the bottom of this? Do you know 
what scans were running around this time? Were there any special or new 
workloads running that hadn't run before?


was (Author: andrew.wong):
[~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs 
that you think might be useful in getting to the bottom of this?

> Tablet server crashed when handle scan request
> --
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: YifanZhang
>Priority: Major
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x02185900 in kudu::UnionIterator::HasNext (this=) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, 
> __in_chrg=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4  kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880, 
> rpc_context=rpc_context@entry=0x8151d7800,     
> result_collector=result_collector@entry=0x7f2d721679f0, 
> has_more_results=has_more_results@entry=0x7f2d721678f9, 
> error_code=error_code@entry=0x7f2d721678fc)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5  0x009fb009 in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6  0x0210f019 in operator() (__args#2=0x8151d7800, 
> __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at 
> /usr/include/c++/4.8.2/functional:2471
> #7  kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8  0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9  0x0228ecaf in operator() (this=0xc1a58c28) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 
> 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 
> 0x7f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)