[jira] [Resolved] (KUDU-2226) Tablets with too many DRSs will cause a huge DMS memory overhead
[ https://issues.apache.org/jira/browse/KUDU-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-2226. --- Fix Version/s: 1.12.0 Resolution: Fixed This is likely a dupe of KUDU-3002, which is fixed in 1.12.0. > Tablets with too many DRSs will cause a huge DMS memory overhead > > > Key: KUDU-2226 > URL: https://issues.apache.org/jira/browse/KUDU-2226 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 > Environment: CentOS6.5 Linux 2.6.32-431 > Kudu1.3.0 > GitCommit 00813f96b9cb >Reporter: ZhangZhen >Priority: Major > Fix For: 1.12.0 > > > I have a table with 10M rows in total and has been hash partitioned to 16 > buckets. Each tablet has about 100MB on disk size according to the /tablets > Web UI. Everyday 50K new rows will be inserted into this table, and about 5M > rows of this table will be updated, that's about half of rows in total, each > row will be updated only once. > Then I found something strange, from the /mem-trackers UI of TS, I found > every tablet of this table occupied about 900MB memory, mainly occupied by > DeltaMemStore, the peak memory consumption is about 1.8G. > I don't understand why the DeltaMemStore will cost so much memory, 900MB DMS > vs 100MB on disk size, that seems strange to me. What's more, I found these > DMS will be flushed very slowly, so for a long time these memory are > occupied, which cause "Soft memory limit exceeded" in the TS, and in result > cause "Rejecting consensus request". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3115) Improve scalability of Kudu masters
Alexey Serbin created KUDU-3115: --- Summary: Improve scalability of Kudu masters Key: KUDU-3115 URL: https://issues.apache.org/jira/browse/KUDU-3115 Project: Kudu Issue Type: Improvement Reporter: Alexey Serbin Currently, multiple masters in a multi-master Kudu cluster are used only for high availability & fault tolerance use cases, but not for sharing the load among the available master nodes. For example, Kudu clients detect current leader master upon connecting to the cluster and send all their subsequent requests to the leader master, so serving many more clients require running masters on more powerful nodes. Current design assumes that masters store and process the requests for metadata only, but that makes sense only up to some limit on the rate of incoming client requests. It would be great to achieve better 'horizontal' scalability for Kudu masters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3114) tserver writes core dump when reporting 'out of space'
[ https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099242#comment-17099242 ] Alexey Serbin commented on KUDU-3114: - Right, it's possible to disable coredumps for Kudu processes by adding {{\-\-disable_core_dumps}} even if the limit for core files size of set to non-zero. My point was that enabling/disabling coredumps per {{LOG(FATAL)}} instance is not feasible. Dumping a core file might have sense when troubleshooting an issue: e.g., if there is a bug in computing the number of bytes to allocate, what event triggered the issue if it's requested to allocate unexpectedly high amount of space, etc. Probably, we can keep that for DEBUG builds only. I'm OK with keeping this JIRA item open (so, I'm re-opening it). Feel free to submit a patch to address the issue as needed. > tserver writes core dump when reporting 'out of space' > -- > > Key: KUDU-3114 > URL: https://issues.apache.org/jira/browse/KUDU-3114 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.1 >Reporter: Balazs Jeszenszky >Priority: Major > Fix For: n/a > > > Fatal log has: > {code} > F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation > failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 > bytes under path (39973171200 bytes available vs 39988335247 bytes > reserved) (error 28) > {code} > Generating a core file in this case yields no benefit, and potentially > compounds the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KUDU-3114) tserver writes core dump when reporting 'out of space'
[ https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin reopened KUDU-3114: - > tserver writes core dump when reporting 'out of space' > -- > > Key: KUDU-3114 > URL: https://issues.apache.org/jira/browse/KUDU-3114 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.1 >Reporter: Balazs Jeszenszky >Priority: Major > Fix For: n/a > > > Fatal log has: > {code} > F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation > failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 > bytes under path (39973171200 bytes available vs 39988335247 bytes > reserved) (error 28) > {code} > Generating a core file in this case yields no benefit, and potentially > compounds the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3114) tserver writes core dump when reporting 'out of space'
[ https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099094#comment-17099094 ] Balazs Jeszenszky commented on KUDU-3114: - This can be controlled from the application, e.g.: https://github.com/apache/kudu/blob/branch-1.12.x/src/kudu/util/os-util.cc#L125-L144 , but I'm aware FATAL errors always generate a core if enabled, which is otherwise preferable. So the request is to turn this into an ERROR instead and exit cleanly. Best practices on space allocation aside, there is no benefit of dumping core at this point IMO. > tserver writes core dump when reporting 'out of space' > -- > > Key: KUDU-3114 > URL: https://issues.apache.org/jira/browse/KUDU-3114 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.1 >Reporter: Balazs Jeszenszky >Priority: Major > Fix For: n/a > > > Fatal log has: > {code} > F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation > failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 > bytes under path (39973171200 bytes available vs 39988335247 bytes > reserved) (error 28) > {code} > Generating a core file in this case yields no benefit, and potentially > compounds the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3114) tserver writes core dump when reporting 'out of space'
[ https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin resolved KUDU-3114. - Fix Version/s: n/a Resolution: Information Provided > tserver writes core dump when reporting 'out of space' > -- > > Key: KUDU-3114 > URL: https://issues.apache.org/jira/browse/KUDU-3114 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.1 >Reporter: Balazs Jeszenszky >Priority: Major > Fix For: n/a > > > Fatal log has: > {code} > F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation > failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 > bytes under path (39973171200 bytes available vs 39988335247 bytes > reserved) (error 28) > {code} > Generating a core file in this case yields no benefit, and potentially > compounds the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3114) tserver writes core dump when reporting 'out of space'
[ https://issues.apache.org/jira/browse/KUDU-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099075#comment-17099075 ] Alexey Serbin commented on KUDU-3114: - Thank you for reporting the issue. The way how fatal inconsistencies are handled in Kudu doesn't provide control to choose between coredump behavior. The behavior of it's controlled at different level: the environment that Kudu processes are run with (check {{ulimit -c}}). As a good operational practice, it's advised to separate the location for core files (some directory at system partition/volume?) and the directories where Kudu stores its data and WAL. Also, consider [enabling mini-dumps in Kudu|https://kudu.apache.org/docs/troubleshooting.html#crash_reporting] and disabling core files if dumping cores isn't feasible due to space limitations. > tserver writes core dump when reporting 'out of space' > -- > > Key: KUDU-3114 > URL: https://issues.apache.org/jira/browse/KUDU-3114 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.1 >Reporter: Balazs Jeszenszky >Priority: Major > > Fatal log has: > {code} > F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation > failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 > bytes under path (39973171200 bytes available vs 39988335247 bytes > reserved) (error 28) > {code} > Generating a core file in this case yields no benefit, and potentially > compounds the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3114) tserver writes core dump when reporting 'out of space'
Balazs Jeszenszky created KUDU-3114: --- Summary: tserver writes core dump when reporting 'out of space' Key: KUDU-3114 URL: https://issues.apache.org/jira/browse/KUDU-3114 Project: Kudu Issue Type: Bug Components: tserver Affects Versions: 1.7.1 Reporter: Balazs Jeszenszky Fatal log has: {code} F0503 23:56:27.359544 40012 status_callback.cc:35] Enqueued commit operation failed to write to WAL: IO error: Insufficient disk space to allocate 8388608 bytes under path (39973171200 bytes available vs 39988335247 bytes reserved) (error 28) {code} Generating a core file in this case yields no benefit, and potentially compounds the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)