[jira] [Resolved] (KUDU-2226) Tablets with too many DRSs will cause a huge DMS memory overhead

2020-05-04 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-2226.
---
Fix Version/s: 1.12.0
   Resolution: Fixed

This is likely a dupe of KUDU-3002, which is fixed in 1.12.0.

> Tablets with too many DRSs will cause a huge DMS memory overhead
> 
>
> Key: KUDU-2226
> URL: https://issues.apache.org/jira/browse/KUDU-2226
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
> Environment: CentOS6.5 Linux 2.6.32-431
> Kudu1.3.0 
> GitCommit 00813f96b9cb
>Reporter: ZhangZhen
>Priority: Major
> Fix For: 1.12.0
>
>
> I have a table with 10M rows in total and has been hash partitioned to 16 
> buckets. Each tablet has about 100MB on disk size according to the /tablets 
> Web UI. Everyday 50K new rows will be inserted into this table, and about 5M 
> rows of this table will be updated, that's about half of rows in total, each 
> row will be updated only once. 
> Then I found something strange, from the /mem-trackers UI of TS, I found 
> every tablet of this table occupied about 900MB memory, mainly occupied by 
> DeltaMemStore, the peak memory consumption is about 1.8G. 
> I don't understand why the DeltaMemStore will cost so much memory, 900MB DMS 
> vs 100MB on disk size, that seems strange to me. What's more, I found these 
> DMS will be flushed very slowly, so for a long time these memory are 
> occupied, which cause "Soft memory limit exceeded" in the TS, and in result 
> cause "Rejecting consensus request".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3115) Improve scalability of Kudu masters

2020-05-04 Thread Alexey Serbin (Jira)
Alexey Serbin created KUDU-3115:
---

 Summary: Improve scalability of Kudu masters
 Key: KUDU-3115
 URL: https://issues.apache.org/jira/browse/KUDU-3115
 Project: Kudu
  Issue Type: Improvement
Reporter: Alexey Serbin


Currently, multiple masters in a multi-master Kudu cluster are used only for 
high availability & fault tolerance use cases, but not for sharing the load 
among the available master nodes.  For example, Kudu clients detect current 
leader master upon connecting to the cluster and send all their subsequent 
requests to the leader master, so serving many more clients require running 
masters on more powerful nodes.  Current design assumes that masters store and 
process the requests for metadata only, but that makes sense only up to some 
limit on the rate of incoming client requests.

It would be great to achieve better 'horizontal' scalability for Kudu masters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3114) tserver writes core dump when reporting 'out of space'

2020-05-04 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099242#comment-17099242
 ] 

Alexey Serbin commented on KUDU-3114:
-

Right, it's possible to disable coredumps for Kudu processes by adding 
{{\-\-disable_core_dumps}} even if the limit for core files size of set to 
non-zero.  My point was that enabling/disabling coredumps per {{LOG(FATAL)}} 
instance is not feasible.

Dumping a core file might have sense when troubleshooting an issue: e.g., if 
there is a bug in computing the number of bytes to allocate, what event 
triggered the issue if it's requested to allocate unexpectedly high amount of 
space, etc.  Probably, we can keep that for DEBUG builds only.

I'm OK with keeping this JIRA item open (so, I'm re-opening it).   Feel free to 
submit a patch to address the issue as needed.

> tserver writes core dump when reporting 'out of space'
> --
>
> Key: KUDU-3114
> URL: https://issues.apache.org/jira/browse/KUDU-3114
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.1
>Reporter: Balazs Jeszenszky
>Priority: Major
> Fix For: n/a
>
>
> Fatal log has:
> {code}
> F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation 
> failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 
> bytes under path  (39973171200 bytes available vs 39988335247 bytes 
> reserved) (error 28)
> {code}
> Generating a core file in this case yields no benefit, and potentially 
> compounds the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KUDU-3114) tserver writes core dump when reporting 'out of space'

2020-05-04 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin reopened KUDU-3114:
-

> tserver writes core dump when reporting 'out of space'
> --
>
> Key: KUDU-3114
> URL: https://issues.apache.org/jira/browse/KUDU-3114
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.1
>Reporter: Balazs Jeszenszky
>Priority: Major
> Fix For: n/a
>
>
> Fatal log has:
> {code}
> F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation 
> failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 
> bytes under path  (39973171200 bytes available vs 39988335247 bytes 
> reserved) (error 28)
> {code}
> Generating a core file in this case yields no benefit, and potentially 
> compounds the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3114) tserver writes core dump when reporting 'out of space'

2020-05-04 Thread Balazs Jeszenszky (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099094#comment-17099094
 ] 

Balazs Jeszenszky commented on KUDU-3114:
-

This can be controlled from the application, e.g.:
https://github.com/apache/kudu/blob/branch-1.12.x/src/kudu/util/os-util.cc#L125-L144

, but I'm aware FATAL errors always generate a core if enabled, which is 
otherwise preferable. So the request is to turn this into an ERROR instead and 
exit cleanly. Best practices on space allocation aside, there is no benefit of 
dumping core at this point IMO.

> tserver writes core dump when reporting 'out of space'
> --
>
> Key: KUDU-3114
> URL: https://issues.apache.org/jira/browse/KUDU-3114
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.1
>Reporter: Balazs Jeszenszky
>Priority: Major
> Fix For: n/a
>
>
> Fatal log has:
> {code}
> F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation 
> failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 
> bytes under path  (39973171200 bytes available vs 39988335247 bytes 
> reserved) (error 28)
> {code}
> Generating a core file in this case yields no benefit, and potentially 
> compounds the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3114) tserver writes core dump when reporting 'out of space'

2020-05-04 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin resolved KUDU-3114.
-
Fix Version/s: n/a
   Resolution: Information Provided

> tserver writes core dump when reporting 'out of space'
> --
>
> Key: KUDU-3114
> URL: https://issues.apache.org/jira/browse/KUDU-3114
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.1
>Reporter: Balazs Jeszenszky
>Priority: Major
> Fix For: n/a
>
>
> Fatal log has:
> {code}
> F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation 
> failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 
> bytes under path  (39973171200 bytes available vs 39988335247 bytes 
> reserved) (error 28)
> {code}
> Generating a core file in this case yields no benefit, and potentially 
> compounds the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3114) tserver writes core dump when reporting 'out of space'

2020-05-04 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099075#comment-17099075
 ] 

Alexey Serbin commented on KUDU-3114:
-

Thank you for reporting the issue.

The way how fatal inconsistencies are handled in Kudu doesn't provide control 
to choose between coredump behavior.  The behavior of it's controlled at 
different level: the environment that Kudu processes are run with (check 
{{ulimit -c}}).

As a good operational practice, it's advised to separate the location for core 
files (some directory at system partition/volume?) and the directories where 
Kudu stores its data and WAL.  Also, consider [enabling mini-dumps in 
Kudu|https://kudu.apache.org/docs/troubleshooting.html#crash_reporting] and 
disabling core files if dumping cores isn't feasible due to space limitations.

> tserver writes core dump when reporting 'out of space'
> --
>
> Key: KUDU-3114
> URL: https://issues.apache.org/jira/browse/KUDU-3114
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.1
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> Fatal log has:
> {code}
> F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation 
> failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 
> bytes under path  (39973171200 bytes available vs 39988335247 bytes 
> reserved) (error 28)
> {code}
> Generating a core file in this case yields no benefit, and potentially 
> compounds the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3114) tserver writes core dump when reporting 'out of space'

2020-05-04 Thread Balazs Jeszenszky (Jira)
Balazs Jeszenszky created KUDU-3114:
---

 Summary: tserver writes core dump when reporting 'out of space'
 Key: KUDU-3114
 URL: https://issues.apache.org/jira/browse/KUDU-3114
 Project: Kudu
  Issue Type: Bug
  Components: tserver
Affects Versions: 1.7.1
Reporter: Balazs Jeszenszky


Fatal log has:
{code}
F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation 
failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 
bytes under path  (39973171200 bytes available vs 39988335247 bytes 
reserved) (error 28)
{code}

Generating a core file in this case yields no benefit, and potentially 
compounds the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)