Re: [DISCUSS] CEP-19: Trie memtable implementation

2022-02-05 Thread Dinesh Joshi
This is excellent. Thanks for opening up this CEP. It would be great to get 
some stats around GC allocation rate / memory pressure, read & write latencies, 
etc. compared to existing implementation.

Dinesh

> On Jan 18, 2022, at 2:13 AM, Branimir Lambov  wrote:
> 
> The memtable pluggability API (CEP-11) is per-table to enable memtable 
> selection that suits specific workflows. It also makes full sense to permit 
> per-node configuration, both to be able to modify the configuration to suit 
> heterogeneous deployments better, as well as to test changes for improvements 
> such as this one.
> Recognizing this, the patch comes with a modification to the API 
> 
>  that defines memtable templates in cassandra.yaml (i.e. per node) and allows 
> the schema to select a template (in addition to being able to specify the 
> full memtable configuration). One could use this e.g. by adding:
> memtable_templates:
> trie:
> class: TrieMemtable
> shards: 16
> skiplist:
> class: SkipListMemtable
> memtable:
> template: skiplist
> (which defines two templates and specifies the default memtable 
> implementation to use) to cassandra.yaml and specifying  WITH memtable = 
> {'template' : 'trie'} in the table schema.
> 
> I intend to commit this modification with the memtable API 
> (CASSANDRA-17034/CEP-11).
> 
> Performance comparisons will be published soon.
> 
> Regards,
> Branimir
> 
> On Fri, Jan 14, 2022 at 4:15 PM Jeff Jirsa  > wrote:
> Sounds like a great addition
> 
> Can you share some of the details around gc and latency improvements you’ve 
> observed with the list? 
> 
> Any specific reason the confirmation is through schema vs yaml? Presumably 
> it’s so a user can test per table, but this changes every host in a cluster, 
> so the impact of a bug/regression is much higher. 
> 
> 
>> On Jan 10, 2022, at 1:30 AM, Branimir Lambov > > wrote:
>> 
>> 
>> We would like to contribute our TrieMemtable to Cassandra. 
>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation
>>  
>> 
>> 
>> This is a new memtable solution aimed to replace the legacy implementation, 
>> developed with the following objectives:
>> - lowering the on-heap complexity and the ability to store memtable indexing 
>> structures off-heap,
>> - leveraging byte order and a trie structure to lower the memory footprint 
>> and improve mutation and lookup performance.
>> 
>> The new memtable relies on CASSANDRA-6936 to translate to and from 
>> byte-ordered representations of types, and CASSANDRA-17034 / CEP-11 to plug 
>> into Cassandra. The memtable is built on multiple shards of custom in-memory 
>> single-writer multiple-reader tries, whose implementation uses a combination 
>> of state-of-the-art and novel features for greater efficiency.
>> 
>> The CEP's JIRA ticket (https://issues.apache.org/jira/browse/CASSANDRA-17240 
>> ) contains the 
>> initial version of the implementation. In its current form it achieves much 
>> better garbage collection latency, significantly bigger data sizes between 
>> flushes for the same memory allocation, as well as drastically increased 
>> write throughput, and we expect the memory and garbage collection 
>> improvements to go much further with upcoming improvements to the solution.
>> 
>> I am interested in hearing your thoughts on the proposal.
>> 
>> Regards,
>> Branimir
>> 



Re: [DISCUSS] Non Coding Committers

2022-02-05 Thread Dinesh Joshi
Hi Sharan,

Thank you for the feedback. Couple points that immediately struck me was the 
use of the word 'committer'. This term is closely associated with code 
contributions and therefore causes confusion. I understand that ASF defines it 
differently but it is unfortunately overloaded. Perhaps the ASF can consider 
using a different term that is not strongly associated with code contributions?

I am very supportive of recognizing non-code contributions. They are valuable 
for the community and need to be appropriately recognized by the project.

Over the years, one of the repeated concerns I have heard is that a non-coding 
committer may make changes to the code causing issues. I find this concern 
silly. As you described with your experience, people with commit bits are aware 
of their role and responsibilities. They will rarely commit code in an area 
that they don't understand. In the worst case we can always revert it ;)

Dinesh

> On Feb 5, 2022, at 7:37 AM, sharanf  wrote:
> 
> Hi All
> I mentioned a while ago that I would start a discussion about having 
> Committers on the project that can focus on non coding contributions.
> Let me start by saying that I wouldn't be doing what I do at the ASF today if 
> the first project I contributed to (Apache OFBiz) had not recognised my non 
> coding contributions and made me a committer. My contributions to the project 
> were mainly testing and documentation. One year when ApacheCon EU came around 
> I helped organise an ApacheCon track for the project. 
> I must admit that I went through quite a few emotions when I received the 
> committership email.. first surprise, excitement but also a little 
> apprehension, because being a committer carries some responsibility. I really 
> didn't want to mess up a project that I cared so much about. What really made 
> the difference for me was that the email highlighted that the PMC trusted me. 
> They trusted me with the codebase - or more importantly they trusted that I 
> would use my judgement about whether or not to do any code changes. And 
> initially it wasn't an issue - I didnt need to update the codebase so I even 
> though I was a committer, I didn't commit anything. Why should I? I continued 
> making sure that the blogs, documentation and social media promotion all kept 
> happening - which really helped the project.
> A little later on we started incorporating documentation into the codebase as 
> asciidoc files so that was when being able to commit changes to the repo 
> became a bit more important. So yes I do commit changes every now and then - 
> but only in the scope of the work I am doing. I went on to become their first 
> ever non coding PMC member too :-). And I can say that being a non coder 
> brings another perspective.
> So here in Apache Cassandra I see there is a whole lot of activity happening 
> around the website, marketing, project promotion, blogs, social media - these 
> activities are all contributions to the project. If there are contributions 
> happening in the project that need a committer to action, then it could make 
> sense to consider having committers that are focussed around the 'non coding' 
> parts.
> I would say that any contribution that helps a project in a positive way is 
> valid contribution so recognising the people that do that work by making them 
> Committers helps not only with motivation but also shows that you value those 
> skills as well as coding.
> We want both coding and non coding contributions to earn the same merit so I 
> see it is more about trusting people to do the right thing for the project. 
> This is written from my own experience so I'm happy to get any feedback, 
> comments and other viewpoints!
> Thanks
> Sharan



[DISCUSS] Non Coding Committers

2022-02-05 Thread sharanf

Hi All

I mentioned a while ago that I would start a discussion about having 
Committers on the project that can focus on non coding contributions.


Let me start by saying that I wouldn't be doing what I do at the ASF 
today if the first project I contributed to (Apache OFBiz) had not 
recognised my non coding contributions and made me a committer. My 
contributions to the project were mainly testing and documentation. One 
year when ApacheCon EU came around I helped organise an ApacheCon track 
for the project.


I must admit that I went through quite a few emotions when I received 
the committership email.. first surprise, excitement but also a little 
apprehension, because being a committer carries some responsibility. I 
really didn't want to mess up a project that I cared so much about. What 
really made the difference for me was that the email highlighted that 
the PMC trusted me. They trusted me with the codebase - or more 
importantly they trusted that I would use my judgement about whether or 
not to do any code changes. And initially it wasn't an issue - I didnt 
need to update the codebase so I even though I was a committer, I didn't 
commit anything. Why should I? I continued making sure that the blogs, 
documentation and social media promotion all kept happening - which 
really helped the project.


A little later on we started incorporating documentation into the 
codebase as asciidoc files so that was when being able to commit changes 
to the repo became a bit more important. So yes I do commit changes 
every now and then - but only in the scope of the work I am doing. I 
went on to become their first ever non coding PMC member too :-). And I 
can say that being a non coder brings another perspective.


So here in Apache Cassandra I see there is a whole lot of activity 
happening around the website, marketing, project promotion, blogs, 
social media - these activities are all contributions to the project. If 
there are contributions happening in the project that need a committer 
to action, then it could make sense to consider having committers that 
are focussed around the 'non coding' parts.


I would say that any contribution that helps a project in a positive way 
is valid contribution so recognising the people that do that work by 
making them Committers helps not only with motivation but also shows 
that you value those skills as well as coding.


We want both coding and non coding contributions to earn the same merit 
so I see it is more about trusting people to do the right thing for the 
project.


This is written from my own experience so I'm happy to get any feedback, 
comments and other viewpoints!


Thanks

Sharan