[jira] [Updated] (KUDU-2206) Kudu client create table timeout

2017-11-01 Thread ZhangZhen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangZhen updated KUDU-2206:

Attachment: trace_tserver07_trace.json
tserver07.flags

Just let the trace "record" runs for 40 minutes and get the trace json file, 
also attach the flags of tserver07, hope it helps. 

> Kudu client create table timeout
> 
>
> Key: KUDU-2206
> URL: https://issues.apache.org/jira/browse/KUDU-2206
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: ZhangZhen
>Priority: Major
> Attachments: kudu_master.log, pstack.zip, trace_tserver07_trace.json, 
> tserver07.flags, tserver_01_0f53a0d3.log, tserver_07_23f962e4a1.log, 
> tsever_02_0a8bbcbb.log
>
>
> We encountered rpc timeout exception when we use sparksql, which use Java 
> kudu client innerly, to create table on kudu cluster. The cluster has 10 
> tserver and 1 master on 10 machines, the target table has 10 range partitions 
> and 5 hash partitions. 
> From the web UI, I found it spent about 3 minutes before all the tablets vote 
> a leader, and I can see a lot delete tablet records in the UI like:
> Delete Tablet Running 2.13 min719f0f496bc34a469e4069b2861b4be8 Delete 
> Tablet RPC for TS=044f1da9a27c46acb82b1386f829f4dc
> Also I find many retry records in tserver logs, like:
> W1031 23:04:40.088256  5816 consensus_peers.cc:357] T 
> fcde65c4e4cf4df29b9ef9884ce292b2 P 0f53a0d3ef7e44ebb0365c800752d5bd -> Peer 
> 23f962e4a1744381ad5fa0d2d8b10241 (c3-kudu-tst-st07.bj:18700): Couldn't send 
> request to peer 23f962e4a1744381ad5fa0d2d8b10241 for tablet 
> fcde65c4e4cf4df29b9ef9884ce292b2. Error code: TABLET_NOT_RUNNING (12). 
> Status: Illegal state: Tablet not RUNNING: NOT_STARTED. Retrying in the next 
> heartbeat period. Already tried 94 times.
> You can find the logs of master and tserver since master receive the create 
> table request in the attachment.
> The kudu version is 1.3.0, the nearest commit is 
> 00813f96b9cb0c9ec57a17e5c85242f7679db0e0
> The exception that client received is like:
> Error: org.apache.kudu.client.NonRecoverableException: RPC can not complete 
> before timeout: KuduRpc(method=IsCreateTableDone, tablet=null, attempt=25, 
> DeadlineTracker(timeout=3, elapsed=28499), Traces: [0ms] sending RPC to 
> server , [0ms] received from server  response OK, [20ms] sending RPC to 
> server , [20ms] received from server  response OK, [40ms] sending RPC to 
> server , [40ms] received from server  response OK, [59ms] sending RPC to 
> server , [60ms] received from server  response OK, [80ms] sending RPC to 
> server , [80ms] received from server  response OK, [100ms] sending RPC to 
> server , [100ms] received from server  response OK, [140ms] sending RPC to 
> server , [141ms] received from server  response OK, [200ms] sending RPC to 
> server , [200ms] received from server  response OK, [319ms] sending RPC to 
> server , [320ms] received from server  response OK, [780ms] sending RPC to 
> server , [780ms] received from server  response OK, [2740ms] sending RPC to 
> server , [2741ms] received from server  response OK, [3580ms] sending RPC to 
> server , [3580ms] received from server  response OK, [4840ms] sending RPC to 
> server , [4840ms] received from server  response OK, [7080ms] sending RPC to 
> server , [7081ms] received from server  response OK, [8320ms] sending RPC to 
> server , [8321ms] received from server  response OK, [11620ms] sending RPC to 
> server , [11621ms] received from server  response OK, [13540ms] sending RPC 
> to server , [13540ms] received from server  response OK, [16819ms] sending 
> RPC to server , [16820ms] received from server  response OK, [19020ms] 
> sending RPC to server , [19020ms] received from server  response OK, 
> [21340ms] sending RPC to server , [21341ms] received from server  response 
> OK, [24660ms] sending RPC to server , [24661ms] received from server  
> response OK, [26800ms] sending RPC to server , [26800ms] received from server 
>  response OK, [27660ms] sending RPC to server , [27660ms] received from 
> server  response OK, [28480ms] sending RPC to server , [28481ms] received 
> from server



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KUDU-2206) Kudu client create table timeout

2017-10-31 Thread ZhangZhen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangZhen updated KUDU-2206:

Attachment: pstack.zip

> Kudu client create table timeout
> 
>
> Key: KUDU-2206
> URL: https://issues.apache.org/jira/browse/KUDU-2206
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: ZhangZhen
>Priority: Major
> Attachments: kudu_master.log, pstack.zip, tserver_01_0f53a0d3.log, 
> tserver_07_23f962e4a1.log, tsever_02_0a8bbcbb.log
>
>
> We encountered rpc timeout exception when we use sparksql, which use Java 
> kudu client innerly, to create table on kudu cluster. The cluster has 10 
> tserver and 1 master on 10 machines, the target table has 10 range partitions 
> and 5 hash partitions. 
> From the web UI, I found it spent about 3 minutes before all the tablets vote 
> a leader, and I can see a lot delete tablet records in the UI like:
> Delete Tablet Running 2.13 min719f0f496bc34a469e4069b2861b4be8 Delete 
> Tablet RPC for TS=044f1da9a27c46acb82b1386f829f4dc
> Also I find many retry records in tserver logs, like:
> W1031 23:04:40.088256  5816 consensus_peers.cc:357] T 
> fcde65c4e4cf4df29b9ef9884ce292b2 P 0f53a0d3ef7e44ebb0365c800752d5bd -> Peer 
> 23f962e4a1744381ad5fa0d2d8b10241 (c3-kudu-tst-st07.bj:18700): Couldn't send 
> request to peer 23f962e4a1744381ad5fa0d2d8b10241 for tablet 
> fcde65c4e4cf4df29b9ef9884ce292b2. Error code: TABLET_NOT_RUNNING (12). 
> Status: Illegal state: Tablet not RUNNING: NOT_STARTED. Retrying in the next 
> heartbeat period. Already tried 94 times.
> You can find the logs of master and tserver since master receive the create 
> table request in the attachment.
> The kudu version is 1.3.0, the nearest commit is 
> 00813f96b9cb0c9ec57a17e5c85242f7679db0e0
> The exception that client received is like:
> Error: org.apache.kudu.client.NonRecoverableException: RPC can not complete 
> before timeout: KuduRpc(method=IsCreateTableDone, tablet=null, attempt=25, 
> DeadlineTracker(timeout=3, elapsed=28499), Traces: [0ms] sending RPC to 
> server , [0ms] received from server  response OK, [20ms] sending RPC to 
> server , [20ms] received from server  response OK, [40ms] sending RPC to 
> server , [40ms] received from server  response OK, [59ms] sending RPC to 
> server , [60ms] received from server  response OK, [80ms] sending RPC to 
> server , [80ms] received from server  response OK, [100ms] sending RPC to 
> server , [100ms] received from server  response OK, [140ms] sending RPC to 
> server , [141ms] received from server  response OK, [200ms] sending RPC to 
> server , [200ms] received from server  response OK, [319ms] sending RPC to 
> server , [320ms] received from server  response OK, [780ms] sending RPC to 
> server , [780ms] received from server  response OK, [2740ms] sending RPC to 
> server , [2741ms] received from server  response OK, [3580ms] sending RPC to 
> server , [3580ms] received from server  response OK, [4840ms] sending RPC to 
> server , [4840ms] received from server  response OK, [7080ms] sending RPC to 
> server , [7081ms] received from server  response OK, [8320ms] sending RPC to 
> server , [8321ms] received from server  response OK, [11620ms] sending RPC to 
> server , [11621ms] received from server  response OK, [13540ms] sending RPC 
> to server , [13540ms] received from server  response OK, [16819ms] sending 
> RPC to server , [16820ms] received from server  response OK, [19020ms] 
> sending RPC to server , [19020ms] received from server  response OK, 
> [21340ms] sending RPC to server , [21341ms] received from server  response 
> OK, [24660ms] sending RPC to server , [24661ms] received from server  
> response OK, [26800ms] sending RPC to server , [26800ms] received from server 
>  response OK, [27660ms] sending RPC to server , [27660ms] received from 
> server  response OK, [28480ms] sending RPC to server , [28481ms] received 
> from server



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)