Top quality spelunking - always fun to read - thanks Martin !

> On 28 Jun 2019, at 10:24, Martin Sumner <martin.sum...@adaptip.co.uk> wrote:
> 
> Bryan,
> 
> We saw that Riak was using much more memory than was expected at the end of 
> the handoffs.  Using `riak-admin top` we could see that this wasn't process 
> memory, but binaries.  Firstly did some work via attach looping over 
> processes and running GC to confirm that this wasn't a failure to collect 
> garbage - the references to memory were real.  Then did a bit of work in 
> attach writing some functions to analyse process_info/2 for each process 
> (looking at binary and memory), and discovered that there were penciller 
> processes that had lots of references to lots of large binaries (and this 
> accounted for all the unexpected memory use), and where the penciller was the 
> only process with a reference to the binary.  This made no sense initially as 
> the penciller should only have small binaries (metadata).  Then looked at the 
> running state of the penciller processes and could see no large binaries in 
> the state, but could see that a lot of the active keys in the penciller were 
> keys that were known to have large object values (but small amounts of 
> metadata) - and that the size of the object values were the same as the size 
> of the binary references found on the penciller process via process_info/2.. 
> 
> I then recalled the first part of this: 
> https://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html
>  
> <https://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html>.
>   It was obvious that in extracting the metadata the beam was naturally 
> retaining a reference to the whole binary, as long as the sub-binary was 
> retained by the a process (the Penciller).  Forcing a binary copy resolved 
> this referencing issue.  It was nice that the same tools used to detect the 
> issue, made it quite easy to write a test to confirm resolution - 
> https://github.com/martinsumner/leveled/blob/master/test/end_to_end/riak_SUITE.erl#L1214-L1239
>  
> <https://github.com/martinsumner/leveled/blob/master/test/end_to_end/riak_SUITE.erl#L1214-L1239>.
> 
> The memory leak section of Fred Herbert's http://www.erlang-in-anger.com/ 
> <http://www.erlang-in-anger.com/> is great reading for helping with these 
> types of issues. 
> 
> Thanks
> 
> Martin
> 
> 
> On Fri, 28 Jun 2019 at 09:46, b h <bryanhuntwit...@gmail.com 
> <mailto:bryanhuntwit...@gmail.com>> wrote:
> Nice work - I've read issue / PR - how did you discover / track it down - 
> tools or just reading the code ? 
> 
> On Fri, 28 Jun 2019 at 09:35, Martin Sumner <martin.sum...@adaptip.co.uk 
> <mailto:martin.sum...@adaptip.co.uk>> wrote:
> There is now a second update available for 2.9.0: 
> https://github.com/basho/riak/tree/riak-2.9.0p2 
> <https://github.com/basho/riak/tree/riak-2.9.0p2>.
> 
> This patch, like the patch before, resolves a memory management issue in 
> leveled, which this time could be triggered by sending many large objects in 
> a short period of time.  The underlying problem is described a bit further 
> here https://github.com/martinsumner/leveled/issues/285 
> <https://github.com/martinsumner/leveled/issues/285>, and is resolved by 
> leveled working more sympathetically with the beam binary memory management. 
> 
> Switching to the patched version is not urgent unless you are using the 
> leveled backend, and may send a large number of large objects in a burst.  
> 
> Updated packages are available (thanks to Nick Adams at TI Tokyo) - 
> https://files.tiot.jp/riak/kv/2.9/2.9.0p2/ 
> <https://files.tiot.jp/riak/kv/2.9/2.9.0p2/>
> 
> Thanks again to the testing team at the NHS Spine project, Aaron Gibbon 
> (BJSS) and Ramen Sen, who discovered the problem.  The issue was discovered 
> in a handoff scenario where there were a tens of thousands of 2MB objects 
> stored in a portion of the keyspace at the end of the handoff - which led to 
> memory issues until either more PUTs were received (to force a persist to 
> disk) or a restart occurred..
> 
> Regards
> 
> 
> On Sat, 25 May 2019 at 09:35, Martin Sumner <martin.sum...@adaptip.co.uk 
> <mailto:martin.sum...@adaptip.co.uk>> wrote:
> Unfortunately, Riak 2.9.0 was released with an issue whereby a race condition 
> in heavy-PUT scenarios (e.g. handoffs), could cause a leak of file 
> descriptors.
> 
> The issue is described here - https://github.com/basho/riak_kv/issues/1699 
> <https://github.com/basho/riak_kv/issues/1699>, and the underlying issue here 
> - https://github.com/martinsumner/leveled/issues/278 
> <https://github.com/martinsumner/leveled/issues/278>.
> 
> There is a new patched version of the release available (2.9.0p1) at 
> https://github.com/basho/riak/tree/riak-2.9.0p1 
> <https://github.com/basho/riak/tree/riak-2.9.0p1>.  This should be used in 
> preference to the original release of 2.9.0.
> 
> Updated packages are available (thanks to Nick Adams at TI Tokyo) - 
> https://files.tiot.jp/riak/kv/2.9/2.9.0p1/ 
> <https://files.tiot.jp/riak/kv/2.9/2.9.0p1/>
> 
> Thanks also to the testing team at the NHS Spine project, Aaron Gibbon (BJSS) 
> and Ramen Sen, who discovered the problem.
> 
> Regards
> 
> Martin
> 
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-- 


Code Sync & Erlang Solutions Conferences

Code Elixir LDN 
<https://www2.erlang-solutions.com/l/23452/2019-06-24/66sbcx> - London: 18 
July 2019

Code BEAM Lite BD 
<https://www2.erlang-solutions.com/l/23452/2019-06-24/66scls> - Budapest: 
20 September 2019

Code BEAM Lite NYC 
<https://www2.erlang-solutions.com/l/23452/2019-06-24/66scvd> - NYC: 01 
October 2019

RabbitMQ Summit 
<https://www2.erlang-solutions.com/l/23452/2019-06-24/66sd8l> - London: 4 
November 2019

Code Mesh LDN 
<https://www2.erlang-solutions.com/l/23452/2019-06-24/66sd8x> - London: 7-8 
November 2019

Code BEAM Lite India - Bangalore: 14 November 2019

Code 
BEAM Lite AMS <https://www2.erlang-solutions.com/l/23452/2019-06-24/66sdbs> 
- Amsterdam: 29 November 2019

Lambda Days 
<https://www2.erlang-solutions.com/l/23452/2019-06-24/66sdcd> - Kraków: 
13-14 February 2020

Code BEAM SF - San Francisco: 5-6 March 2020





*Erlang Solutions cares about your data and privacy; please find all 
details about the basis for communicating with you and the way we process 
your data in our **Privacy Policy* 
<https://www.erlang-solutions.com/privacy-policy.html>*.You can update your 
email preferences or opt-out from receiving Marketing emails here 
<http://www2.erlang-solutions.com/emailpreference>.*
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to