Top quality spelunking - always fun to read - thanks Martin ! > On 28 Jun 2019, at 10:24, Martin Sumner <martin.sum...@adaptip.co.uk> wrote: > > Bryan, > > We saw that Riak was using much more memory than was expected at the end of > the handoffs. Using `riak-admin top` we could see that this wasn't process > memory, but binaries. Firstly did some work via attach looping over > processes and running GC to confirm that this wasn't a failure to collect > garbage - the references to memory were real. Then did a bit of work in > attach writing some functions to analyse process_info/2 for each process > (looking at binary and memory), and discovered that there were penciller > processes that had lots of references to lots of large binaries (and this > accounted for all the unexpected memory use), and where the penciller was the > only process with a reference to the binary. This made no sense initially as > the penciller should only have small binaries (metadata). Then looked at the > running state of the penciller processes and could see no large binaries in > the state, but could see that a lot of the active keys in the penciller were > keys that were known to have large object values (but small amounts of > metadata) - and that the size of the object values were the same as the size > of the binary references found on the penciller process via process_info/2.. > > I then recalled the first part of this: > https://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html > > <https://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html>. > It was obvious that in extracting the metadata the beam was naturally > retaining a reference to the whole binary, as long as the sub-binary was > retained by the a process (the Penciller). Forcing a binary copy resolved > this referencing issue. It was nice that the same tools used to detect the > issue, made it quite easy to write a test to confirm resolution - > https://github.com/martinsumner/leveled/blob/master/test/end_to_end/riak_SUITE.erl#L1214-L1239 > > <https://github.com/martinsumner/leveled/blob/master/test/end_to_end/riak_SUITE.erl#L1214-L1239>. > > The memory leak section of Fred Herbert's http://www.erlang-in-anger.com/ > <http://www.erlang-in-anger.com/> is great reading for helping with these > types of issues. > > Thanks > > Martin > > > On Fri, 28 Jun 2019 at 09:46, b h <bryanhuntwit...@gmail.com > <mailto:bryanhuntwit...@gmail.com>> wrote: > Nice work - I've read issue / PR - how did you discover / track it down - > tools or just reading the code ? > > On Fri, 28 Jun 2019 at 09:35, Martin Sumner <martin.sum...@adaptip.co.uk > <mailto:martin.sum...@adaptip.co.uk>> wrote: > There is now a second update available for 2.9.0: > https://github.com/basho/riak/tree/riak-2.9.0p2 > <https://github.com/basho/riak/tree/riak-2.9.0p2>. > > This patch, like the patch before, resolves a memory management issue in > leveled, which this time could be triggered by sending many large objects in > a short period of time. The underlying problem is described a bit further > here https://github.com/martinsumner/leveled/issues/285 > <https://github.com/martinsumner/leveled/issues/285>, and is resolved by > leveled working more sympathetically with the beam binary memory management. > > Switching to the patched version is not urgent unless you are using the > leveled backend, and may send a large number of large objects in a burst. > > Updated packages are available (thanks to Nick Adams at TI Tokyo) - > https://files.tiot.jp/riak/kv/2.9/2.9.0p2/ > <https://files.tiot.jp/riak/kv/2.9/2.9.0p2/> > > Thanks again to the testing team at the NHS Spine project, Aaron Gibbon > (BJSS) and Ramen Sen, who discovered the problem. The issue was discovered > in a handoff scenario where there were a tens of thousands of 2MB objects > stored in a portion of the keyspace at the end of the handoff - which led to > memory issues until either more PUTs were received (to force a persist to > disk) or a restart occurred.. > > Regards > > > On Sat, 25 May 2019 at 09:35, Martin Sumner <martin.sum...@adaptip.co.uk > <mailto:martin.sum...@adaptip.co.uk>> wrote: > Unfortunately, Riak 2.9.0 was released with an issue whereby a race condition > in heavy-PUT scenarios (e.g. handoffs), could cause a leak of file > descriptors. > > The issue is described here - https://github.com/basho/riak_kv/issues/1699 > <https://github.com/basho/riak_kv/issues/1699>, and the underlying issue here > - https://github.com/martinsumner/leveled/issues/278 > <https://github.com/martinsumner/leveled/issues/278>. > > There is a new patched version of the release available (2.9.0p1) at > https://github.com/basho/riak/tree/riak-2.9.0p1 > <https://github.com/basho/riak/tree/riak-2.9.0p1>. This should be used in > preference to the original release of 2.9.0. > > Updated packages are available (thanks to Nick Adams at TI Tokyo) - > https://files.tiot.jp/riak/kv/2.9/2.9.0p1/ > <https://files.tiot.jp/riak/kv/2.9/2.9.0p1/> > > Thanks also to the testing team at the NHS Spine project, Aaron Gibbon (BJSS) > and Ramen Sen, who discovered the problem. > > Regards > > Martin > > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com <mailto:riak-users@lists.basho.com> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com> > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-- Code Sync & Erlang Solutions Conferences Code Elixir LDN <https://www2.erlang-solutions.com/l/23452/2019-06-24/66sbcx> - London: 18 July 2019 Code BEAM Lite BD <https://www2.erlang-solutions.com/l/23452/2019-06-24/66scls> - Budapest: 20 September 2019 Code BEAM Lite NYC <https://www2.erlang-solutions.com/l/23452/2019-06-24/66scvd> - NYC: 01 October 2019 RabbitMQ Summit <https://www2.erlang-solutions.com/l/23452/2019-06-24/66sd8l> - London: 4 November 2019 Code Mesh LDN <https://www2.erlang-solutions.com/l/23452/2019-06-24/66sd8x> - London: 7-8 November 2019 Code BEAM Lite India - Bangalore: 14 November 2019 Code BEAM Lite AMS <https://www2.erlang-solutions.com/l/23452/2019-06-24/66sdbs> - Amsterdam: 29 November 2019 Lambda Days <https://www2.erlang-solutions.com/l/23452/2019-06-24/66sdcd> - Kraków: 13-14 February 2020 Code BEAM SF - San Francisco: 5-6 March 2020 *Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our **Privacy Policy* <https://www.erlang-solutions.com/privacy-policy.html>*.You can update your email preferences or opt-out from receiving Marketing emails here <http://www2.erlang-solutions.com/emailpreference>.*
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com