Re: Welcome Mike Adamson as Cassandra committer

2023-12-09 Thread Jasonstack Zhao Yang
Congrats Mike!

On Sat, 9 Dec 2023 at 8:47 PM, Piotr Kołaczkowski 
wrote:

> Congratulations, Mike! Well deserved, working with you has always been a
> pleasure!
>
>
> Wiadomość napisana przez Melissa Logan  w dniu
> 09.12.2023, o godz. 02:35:
>
> 
>
> Congratulations, Mike!
>
> On Fri, Dec 8, 2023 at 11:13 AM David Capwell  wrote:
>
>> Congrats!
>>
>> On Dec 8, 2023, at 11:00 AM, Lorina Poland  wrote:
>>
>> Congratulations, Mike!
>>
>>
>>


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Jasonstack Zhao Yang
+1

On Fri, 26 May 2023 at 8:44 AM, Yifan Cai  wrote:

> +1
> --
> *From:* Josh McKenzie 
> *Sent:* Thursday, May 25, 2023 5:37:02 PM
> *To:* dev 
> *Subject:* Re: [VOTE] CEP-30 ANN Vector Search
>
> +1
>
> On Thu, May 25, 2023, at 8:33 PM, Jake Luciani wrote:
>
> +1
>
> On Thu, May 25, 2023 at 11:45 AM Jonathan Ellis  wrote:
>
> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
> --
> http://twitter.com/tjake
>
>


Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-17 Thread Jasonstack Zhao Yang
Hi,

I have updated the CEP with some details about distributed queries in the
*Approach* section.

David:

> given results have a real ranking, the current 2i logic may yield
incorrect results

C* internal iterators are all in primary key order. So we need two
in-memory top-k filters, one at replica side and one at coordinator side,
to make sure the returned rows are actually top-k but still primary key
order.

> if 1 of the queries fails and can’t fall back to peers… does the query
fail (I assume so)

yes, it will fail. we can make it pass if lower recall is acceptable.

Caleb:

> With smaller clusters or use-cases that are extremely
write-heavy/read-light, it's possible that the full scatter/gather won't be
too onerous, especially w/ a few small tweaks (on top of a non-vnode
cluster)

You are right. Smaller cluster would definitely requires less coordinator
memory to cache all required replicas' responses.


Jeremy:

>  With SAI, can you have partial results?  When you have a query that is
non-key based, you need to have full token range coverage of the results.
If that isn't possible, will Vector Search/SAI return partial results?

No partial result allowed. Query will failed with unavailability exception
if some required token range is not available. For ANN search, users might
be willing to have lower recall (partial results) with higher availability.

>  First, how is ordering/scoring done?
> Each replica returns back to the coordinator a sorted set of results and
the coordinator will have to see all of the results globally in order to do
a global ordering.  You can't know what the top result is unless you've
seen everything.  As to the scoring, I'm not sure how that will get
calculated.

The results will be top-k but still in primary key order. Scores are
computed based on vector similarly function.

Top-K search need two top-k filter as described in CEP.

> Second, if I am ordering the results like for a Vector Search and I want
to have the top 1 result.  How is the scoring done and what happens if
there are 20 that have the same score?  How will the coordinator decide
which 1 is returned out of 20?

It will be the row with smaller primary key order.

On Wed, 10 May 2023 at 05:39, Jeremy Hanna 
wrote:

> Just wanted to add that I don't have any special knowledge of CEP-30
> beyond what Jonathan posted and just trying to help clarify and answer
> questions as I can with some knowledge and experience from DSE Search and
> SAI.  Thanks to Caleb for helping validate some things as well.  And to be
> clear about partial results - the default with DSE Search at least is to
> fail a query if it can't get the full token range coverage.  However there
> is an option to allow for shards being unavailable and return partial
> results.
>
> On May 9, 2023, at 3:38 PM, Jeremy Hanna 
> wrote:
>
> I talked to David and some others in slack to hopefully clarify:
>
> With SAI, can you have partial results?  When you have a query that is
> non-key based, you need to have full token range coverage of the results.
> If that isn't possible, will Vector Search/SAI return partial results?
>
> Anything can happen in the implementation, but for scoring, it may not
> make sense to return partial results because it's misleading.  For
> non-global queries, it could or couldn't return partial results depending
> on implementation/configuration.  In DSE you could have partial results
> depending on the options.   However I couldn't find partial results defined
> in CEP-7 or CEP-30.
>
> The other questions are about scoring.
>
> First, how is ordering/scoring done?
>
> Each replica returns back to the coordinator a sorted set of results and
> the coordinator will have to see all of the results globally in order to do
> a global ordering.  You can't know what the top result is unless you've
> seen everything.  As to the scoring, I'm not sure how that will get
> calculated.
>
> Second, if I am ordering the results like for a Vector Search and I want
> to have the top 1 result.  How is the scoring done and what happens if
> there are 20 that have the same score?  How will the coordinator decide
> which 1 is returned out of 20?
>
> It returns results in token/partition and then clustering order.
>
> On May 9, 2023, at 2:53 PM, Caleb Rackliffe 
> wrote:
>
> Anyone on this ML who still remembers DSE Search (or has experience w/
> Elastic or SolrCloud) probably also knows that there are some significant
> pieces of an optimized scatter/gather apparatus for IR (even without
> sorting, which also doesn't exist yet) that do not exist in C* or it's
> range query system (which SAI and all other 2i implementations use). SAI,
> like all C* 2i implementations, is still a local index, and as that is the
> case, anything built on it will perform best in partition-scoped (at least
> on the read side) use-cases. (On the bright side, the project is moving
> toward larger partitions being a possibility.) With smaller clusters or
> use-cases that are

Re: Welcome our next PMC Chair Josh McKenzie

2023-03-23 Thread Jasonstack Zhao Yang
Congrats Josh! And thank you Mick!

On Thu, 23 Mar 2023 at 23:42, Jeremy Hanna 
wrote:

> Thank you Mick for all of your hard work in the project including your
> time as the PMC chair!
>
> Thank you too Josh for all that you do - and for the work that you'll do
> as chair!
>
> On Mar 23, 2023, at 3:22 AM, Mick Semb Wever  wrote:
>
> It is time to pass the baton on, and on behalf of the Apache Cassandra
> Project Management Committee (PMC) I would like to welcome and congratulate
> our next PMC Chair Josh McKenzie (jmckenzie).
>
> Most of you already know Josh, especially through his regular and valuable
> project oversight and status emails, always presenting a balance and
> understanding to the various views and concerns incoming.
>
> Repeating Paulo's words from last year: The chair is an administrative
> position that interfaces with the Apache Software Foundation Board, by
> submitting regular reports about project status and health. Read more about
> the PMC chair role on Apache projects:
> - https://www.apache.org/foundation/how-it-works.html#pmc
> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>
> The PMC as a whole is the entity that oversees and leads the project and
> any PMC member can be approached as a representative of the committee. A
> list of Apache Cassandra PMC members can be found on:
> https://cassandra.apache.org/_/community.html
>
>
>


Re: Thanks to Nate for his service as PMC Chair

2022-07-15 Thread Jasonstack Zhao Yang
Thank you, Nate!

Congrats Mick!

On Fri, 15 Jul 2022 at 02:42, Henrik Ingo  wrote:

> Thank you Nate for holding the baton for all these years. Even as a
> relative newcomer (2+ years already) I wanted to say I do understand and
> appreciate your role in carrying the torch to where the project is today.
>
> And Congratulations Mick. Your humble and quiet style of serving the
> project is something me and many others can look up to. Thank you for all
> the time and energy you bring to Cassandra.
>
> henrik
>
> On Mon, Jul 11, 2022 at 3:55 PM Paulo Motta  wrote:
>
>> Hi,
>>
>> I wanted to announce on behalf of the Apache Cassandra Project Management
>> Committee (PMC) that Nate McCall (zznate) has stepped down from the PMC
>> chair role. Thank you Nate for all the work you did as the PMC chair!
>>
>> The Apache Cassandra PMC has nominated Mick Semb Wever (mck) as the new
>> PMC chair. Congratulations and good luck on the new role Mick!
>>
>> The chair is an administrative position that interfaces with the Apache
>> Software Foundation Board, by submitting regular reports about project
>> status and health. Read more about the PMC chair role on Apache projects:
>> - https://www.apache.org/foundation/how-it-works.html#pmc
>> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
>> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>>
>> The PMC as a whole is the entity that oversees and leads the project and
>> any PMC member can be approached as a representative of the committee. A
>> list of Apache Cassandra PMC members can be found on:
>> https://cassandra.apache.org/_/community.html
>>
>> Kind regards,
>>
>> Paulo
>>
>
>
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.]   [image: Visit us
> on Twitter.]   [image: Visit us on
> YouTube.]
> 
>   [image: Visit my LinkedIn profile.]
> 
>


Re: Welcome Aleksandr Sorokoumov as Cassandra committer

2022-03-16 Thread Jasonstack Zhao Yang
Congrats Aleks!

On Wed, 16 Mar 2022 at 22:01, J. D. Jordan 
wrote:

> Congratulations!
>
> On Mar 16, 2022, at 8:43 AM, Ekaterina Dimitrova 
> wrote:
>
> 
> Great news! Well deserved! Congrats and thank you for all your support!
>
> On Wed, 16 Mar 2022 at 9:41, Paulo Motta  wrote:
>
>> Congratulations Alex, well deserved! :-)
>>
>> Em qua., 16 de mar. de 2022 às 10:15, Benjamin Lerer 
>> escreveu:
>>
>>> The PMC members are pleased to announce that Aleksandr Sorokoumov has
>>> accepted
>>> the invitation to become committer.
>>>
>>> Thanks a lot, Aleksandr , for everything you have done for the project.
>>>
>>> Congratulations and welcome
>>>
>>> The Apache Cassandra PMC members
>>>
>>


Re: [VOTE] CEP-7: Storage Attached Index

2022-02-17 Thread Jasonstack Zhao Yang
+1

On Fri, 18 Feb 2022 at 08:15, Jeremy Hanna 
wrote:

> +1 nb. Thanks Caleb, Mike, Jason, and everyone involved with the effort.
>
> On Feb 17, 2022, at 4:23 PM, Caleb Rackliffe 
> wrote:
>
> 
> Hi everyone,
>
> I'd like to call a vote to approve CEP-7.
>
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
>
> Discussion:
> https://lists.apache.org/thread/hh67k3t86m7299qkt61gmzb4h96bl90w
>
> The vote will be open for 72 hours.
> Votes by committers are considered binding.
> A vote passes if there are at least three binding +1s and no binding
> vetoes.
>
> Thanks!
> Caleb
>
>


Re: [VOTE] Release Apache Cassandra 4.0.0 (third time is the charm)

2021-07-26 Thread Jasonstack Zhao Yang
+1

On Mon, 26 Jul 2021 at 22:02, Michael Shuler  wrote:

> +1
>
> Kind regards,
> Michael
>
> On 7/22/21 5:40 PM, Brandon Williams wrote:
> > I am proposing the test build of Cassandra 4.0.0 for release.
> >
> > sha1: 902b4d31772eaa84f05ffdc1e4f4b7a66d5b17e6
> > Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.0-tentative
> > Maven Artifacts:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1244/org/apache/cassandra/cassandra-all/4.0.0/
> >
> > The Source and Build Artifacts, and Debian and RPM packages and
> > repositories are available here:
> > https://dist.apache.org/repos/dist/dev/cassandra/4.0.0/
> >
> > The vote will be open for 72 hours (longer if needed). Everyone who
> > has tested the build is invited to vote. Votes by PMC members are
> > considered binding. A vote passes if there are at least three binding
> > +1s and no -1's.
> >
> > [1]: CHANGES.txt:
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.0-tentative
> > [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.0-tentative
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Welcome Caleb Rackliffe as Cassandra committer

2021-05-14 Thread Jasonstack Zhao Yang
Congrats, Caleb!


Re: [VOTE] Release Apache Cassandra 4.0-rc1 (take2)

2021-04-22 Thread Jasonstack Zhao Yang
+1

On Fri, 23 Apr 2021 at 08:16, Nate McCall  wrote:

> +1
>
>
> On Thu, Apr 22, 2021 at 6:59 AM Mick Semb Wever  wrote:
>
> > Proposing the test build of Cassandra 4.0-rc1 for release.
> >
> > sha1: 3282f5ecf187ecbb56b8d73ab9a9110c010898b0
> > Git:
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-rc1-tentative
> > Maven Artifacts:
> >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1235/org/apache/cassandra/cassandra-all/4.0-rc1/
> >
> > The Source and Build Artifacts, and the Debian and RPM packages and
> > repositories, are available here:
> > https://dist.apache.org/repos/dist/dev/cassandra/4.0-rc1/
> >
> > The vote will be open for 72 hours (longer if needed). Everyone who
> > has tested the build is invited to vote. Votes by PMC members are
> > considered binding. A vote passes if there are at least three binding
> > +1s and no -1's.
> >
> > [1]: CHANGES.txt:
> >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-rc1-tentative
> > [2]: NEWS.txt:
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-rc1-tentative
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: Welcome Berenguer Blasi as Cassandra committer

2021-03-25 Thread Jasonstack Zhao Yang
Congrats Berenguer!

On Thu, 25 Mar 2021 at 18:11, Erick Ramirez 
wrote:

> Congratulations, Berenguer! Thanks for all the work you've done. 🍻
>
>
> On Thu, 25 Mar 2021 at 21:10, Benjamin Lerer  wrote:
>
> >  The PMC's members are pleased to announce that Berenguer Blasi has
> > accepted the invitation to become committer today.
> >
> > Thanks a lot,  Berenguer,  for all the work you have done!
> >
> > Congratulations and welcome
> >
> > The Apache Cassandra PMC members
> >
>


Re: Welcome Paulo Motta as Cassandra PMC member

2021-02-09 Thread Jasonstack Zhao Yang
Congrats Paulo!

On Wed, 10 Feb 2021 at 00:03, Ekaterina Dimitrova 
wrote:

> Congrats! Well done!
>
> On Tue, 9 Feb 2021 at 11:02, J. D. Jordan 
> wrote:
>
> > Congrats Paulo! A great addition to the PMC.
> >
> > > On Feb 9, 2021, at 9:59 AM, Jonathan Ellis  wrote:
> > >
> > > Congratulations, Paulo!  Well deserved.
> > >
> > >> On Tue, Feb 9, 2021 at 9:54 AM Benjamin Lerer <
> > benjamin.le...@datastax.com>
> > >> wrote:
> > >>
> > >> The PMC's members are pleased to announce that Paulo Motta has
> accepted
> > >> the invitation to become a PMC member yesterday.
> > >>
> > >> Thanks a lot, Paulo, for everything you have done for the project all
> > these
> > >> years.
> > >>
> > >> Congratulations and welcome
> > >>
> > >> The Apache Cassandra PMC members
> > >>
> > >
> > >
> > > --
> > > Jonathan Ellis
> > > co-founder, http://www.datastax.com
> > > @spyced
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: Regarding Materialized Views

2020-12-18 Thread Jasonstack Zhao Yang
Hi,

> 1. When will MVs be enabled for production use again ?

at least 4.x, it will be assessed after CASSANDRA-15921


> 2. Is there any plan to support secondary indexes on materialized views ?

unless your index query is partition-restricted on new MV partition key,
querying 2i on MV won't be faster than 2i on base table.


On Fri, 18 Dec 2020 at 17:33, Shaurya Gupta  wrote:

> Hi
> A couple of questions on materialized views in cassandra -
> 1. When will MVs be enabled for production use again ?
> 2. Is there any plan to support secondary indexes on materialized views ?
>
> Thanks
> --
> Shaurya Gupta
>


Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-24 Thread Jasonstack Zhao Yang
>> Question is: is this planned as a next step?
>> If yes, how are we going to mark SAI as experimental until it gets
>> row offsets? Also, it is likely that index format is going to change when
>> row offsets are added, so my concern is that we may have to support two
>> versions of a format for a smooth migration.

The goal is to support row-level index when merging SAI, I will update the
CEP about it.

>> I think switching to row
>> offsets also has a huge impact on interaction with SPRC and has some
>> potential for optimisations.

Can you share more details on the optimizations?



On Thu, 24 Sep 2020 at 15:20, Oleksandr Petrov 
wrote:

> > But for improving overall index read performance, I think improving base
> table read perf  (because SAI/SASI executes LOTS of
> SinglePartitionReadCommand after searching on-disk index) is more effective
> than switching from Trie to Prefix BTree.
>
> I haven't suggested switching to Prefix B-Tree or any other structure, the
> question was about rationale and motivation of picking one over the other,
> which I am curious about for personal reasons/interests that lie outside of
> Cassandra. Having this listed in CEP could have been helpful for future
> guidance. It's ok if this question is outside of the CEP scope.
>
> I also agree that there are many areas that require improvement around the
> read/write path and 2i, many of which (even outside of base table format or
> read perf) can yield positive performance results.
>
> > FWIW, I personally look forward to receiving that contribution when the
> time is right.
>
> I am very excited for this contribution, too, and it looks like very solid
> work.
>
> I have one more question, about "Upon resolving partition keys, rows are
> loaded using Cassandra’s internal partition read command across SSTables
> and are post filtered". One of the criticisms of SASI and reasons for
> marking it as experimental was CASSANDRA-11990. I think switching to row
> offsets also has a huge impact on interaction with SPRC and has some
> potential for optimisations. Question is: is this planned as a next step?
> If yes, how are we going to mark SAI as experimental until it gets
> row offsets? Also, it is likely that index format is going to change when
> row offsets are added, so my concern is that we may have to support two
> versions of a format for a smooth migration.
>
>
>
> On Thu, Sep 24, 2020 at 6:53 AM Jasonstack Zhao Yang <
> jasonstack.z...@gmail.com> wrote:
>
> > >> I think CEP should be more upfront with "eventually replace
> > >>  it" bit, since it raises the question about what the people who are
> > using
> > >> other index implementations can expect.
> >
> > Will update the CEP to emphasize: SAI will replace other indexes.
> >
> > >> Unfortunately, I do not have an
> > >> implementation sitting around for a direct comparison, but I can
> imagine
> > >> situations when B-Trees may perform better because of simpler
> > construction.
> > >> Maybe we should even consider prototyping a prefix B-Tree to have a
> more
> > >> fair comparison.
> >
> > As long as prefix BTree supports range/prefix aggregation (which is used
> to
> > speed up
> > range/prefix query when matching entire subtree), we can plug it in and
> > compare. It won't
> > affect the CEP design which focuses on sharing data across indexes and
> > posting aggregation.
> >
> > But for improving overall index read performance, I think improving base
> > table read perf
> >  (because SAI/SASI executes LOTS of SinglePartitionReadCommand after
> > searching on-disk index)
> > is more effective than switching from Trie to Prefix BTree.
> >
> >
> >
> > On Thu, 24 Sep 2020 at 05:33, Benedict Elliott Smith <
> bened...@apache.org>
> > wrote:
> >
> > > FWIW, I personally look forward to receiving that contribution when the
> > > time is right.
> > >
> > > On 23/09/2020, 18:45, "Josh McKenzie"  wrote:
> > >
> > > talking about that would involve some bits of information DataStax
> > > might
> > > not be ready to share?
> > >
> > > At the risk of derailing, I've been poking and prodding this week
> at
> > we
> > > contributors at DS getting our act together w/a draft CEP for
> > donating
> > > the
> > > trie-based indices to the ASF project.
> > >
> > > More to come; the intention is certainly to contribute that code.
> The
>

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Jasonstack Zhao Yang
>> I think CEP should be more upfront with "eventually replace
>>  it" bit, since it raises the question about what the people who are
using
>> other index implementations can expect.

Will update the CEP to emphasize: SAI will replace other indexes.

>> Unfortunately, I do not have an
>> implementation sitting around for a direct comparison, but I can imagine
>> situations when B-Trees may perform better because of simpler
construction.
>> Maybe we should even consider prototyping a prefix B-Tree to have a more
>> fair comparison.

As long as prefix BTree supports range/prefix aggregation (which is used to
speed up
range/prefix query when matching entire subtree), we can plug it in and
compare. It won't
affect the CEP design which focuses on sharing data across indexes and
posting aggregation.

But for improving overall index read performance, I think improving base
table read perf
 (because SAI/SASI executes LOTS of SinglePartitionReadCommand after
searching on-disk index)
is more effective than switching from Trie to Prefix BTree.



On Thu, 24 Sep 2020 at 05:33, Benedict Elliott Smith 
wrote:

> FWIW, I personally look forward to receiving that contribution when the
> time is right.
>
> On 23/09/2020, 18:45, "Josh McKenzie"  wrote:
>
> talking about that would involve some bits of information DataStax
> might
> not be ready to share?
>
> At the risk of derailing, I've been poking and prodding this week at we
> contributors at DS getting our act together w/a draft CEP for donating
> the
> trie-based indices to the ASF project.
>
> More to come; the intention is certainly to contribute that code. The
> lack
> of a destination to merge it into (i.e. no 5.0-dev branch) is removing
> significant urgency from the process as well (not to open a 3rd
> Pandora's
> box), but there's certainly an interrelatedness to the conversations
> going
> on.
>
> ---
> Josh McKenzie
>
>
> Sent via Superhuman <https://sprh.mn/?vip=jmcken...@apache.org>
>
>
> On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe <
> calebrackli...@gmail.com>
> wrote:
>
> > As long as we can construct the on-disk indexes efficiently/directly
> from
> > a Memtable-attached index on flush, there's room to try other data
> > structures. Most of the innovation in SAI is around the layout of
> postings
> > (something we can expand on if people are interested) and having a
> > natively row-oriented design that scales w/ multiple indexed columns
> on
> > single SSTables. There are some broader implications of using the
> trie that
> > reach outside SAI itself, but talking about that would involve some
> bits of
> > information DataStax might not be ready to share?
> >
> > On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < jeremiah.jordan@
> > gmail.com> wrote:
> >
> > Short question: looking forward, how are we going to maintain three
> 2i
> > implementations: SASI, SAI, and 2i?
> >
> > I think one of the goals stated in the CEP is for SAI to have parity
> with
> > 2i such that it could eventually replace it.
> >
> > On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
> >
> > oleksandr.pet...@gmail.com> wrote:
> >
> > Short question: looking forward, how are we going to maintain three
> 2i
> > implementations: SASI, SAI, and 2i?
> >
> > Another thing I think this CEP is missing is rationale and motivation
> > about why trie-based indexes were chosen over, say, B-Tree. We did
> have a
> > short discussion about this on Slack, but both arguments that I've
> heard
> > (space-saving and keeping a small subset of nodes in memory) work
> only
> >
> > for
> >
> > the most primitive implementation of a B-Tree. Fully-occupied prefix
> >
> > B-Tree
> >
> > can have similar properties. There's been a lot of research on
> B-Trees
>     >
> > and
> >
> > optimisations in those. Unfortunately, I do not have an
> implementation
> > sitting around for a direct comparison, but I can imagine situations
> when
> > B-Trees may perform better because of simpler
> >
> > construction.
> >
> > Maybe we should even consider prototyping a prefix B-Tree to have a
> more
> > fair comparison.
> >
> > Thank you,
> > -- Alex
> >
> > On Thu,

Re: [VOTE] Accept the Harry donation

2020-09-17 Thread Jasonstack Zhao Yang
+1 nb

On Fri, Sep 18, 2020, 12:26 Oleksandr Petrov 
wrote:

> +1
>
> On Thu, Sep 17, 2020 at 6:28 PM Blake Eggleston
>  wrote:
>
> > +1
> >
> > > On Sep 16, 2020, at 2:45 AM, Mick Semb Wever  wrote:
> > >
> > > This vote is about officially accepting the Harry donation from Alex
> > Petrov
> > > and Benedict Elliott Smith, that was worked on in CASSANDRA-15348.
> > >
> > > The Incubator IP Clearance has been filled out at
> > > http://incubator.apache.org/ip-clearance/apache-cassandra-harry.html
> > >
> > > This vote is a required part of the IP Clearance process. It follows
> the
> > > same voting rules as releases, i.e. from the PMC a minimum of three +1s
> > and
> > > no -1s.
> > >
> > > Please cast your votes:
> > >   [ ] +1 Accept the contribution into Cassandra
> > >   [ ] -1 Do not
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
> alex p
>


Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-10 Thread Jasonstack Zhao Yang
Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7 SAI.

The recorded video is available here:
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-01+Apache+Cassandra+Contributor+Meeting

On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang 
wrote:

> Thank you, Charles and Patrick
>
> On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:
>
>> Thank you, Patrick!
>>
>> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
>> wrote:
>> >
>> > I just moved it to 8AM for this meeting to better accommodate APAC.
>> Please
>> > see the update here:
>> >
>> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
>> >
>> > Patrick
>> >
>> > On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
>> wrote:
>> >
>> > > Patrick,
>> > >
>> > > 11AM PST is a bad time for the people in the APAC timezone. Can we
>> > > move it to 7 or 8AM PST in the morning to accommodate their needs ?
>> > >
>> > > ~Charles
>> > >
>> > > On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin 
>> > > wrote:
>> > > >
>> > > > Meeting scheduled.
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
>> > > >
>> > > > Tuesday September 1st, 11AM PST. I added a basic bullet for the
>> agenda
>> > > but
>> > > > if there is more, edit away.
>> > > >
>> > > > Patrick
>> > > >
>> > > > On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
>> > > > jasonstack.z...@gmail.com> wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
>> > > e.dimitr...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > +1
>> > > > > >
>> > > > > > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
>> > > calebrackli...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > +1
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
>> pmcfa...@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > > This is related to the discussion Jordan and I had about the
>> > > > > > contributor
>> > > > > > >
>> > > > > > > > Zoom call. Instead of open mic for any issue, call it based
>> on a
>> > > > > > > discussion
>> > > > > > >
>> > > > > > > > thread or threads for higher bandwidth discussion.
>> > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > I would be happy to schedule on for next week to
>> specifically
>> > > discuss
>> > > > > > >
>> > > > > > > > CEP-7. I can attach the recorded call to the CEP after.
>> > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > +1 or -1?
>> > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > Patrick
>> > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie <
>> > > > > jmcken...@apache.org>
>> > > > > > >
>> > > > > > > > wrote:
>> > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > > >
>> > > > > > >
>> > > > > > > > > > Does community plan to open another discussion or CEP on
>> > > > > > >
>> > > > > > > > modularization?
>> >

Re: [DISCUSS] Change style guide to recommend use of @Override

2020-09-01 Thread Jasonstack Zhao Yang
+1

On Wed, 2 Sep 2020 at 02:45, Dinesh Joshi  wrote:

> +1
>
> > On Sep 1, 2020, at 11:27 AM, David Capwell  wrote:
> >
> > Currently our style guide recommends to avoid using @Override and updates
> > intellij's code style to exclude it by default; I would like to propose
> we
> > change this recommendation to use it and to update intellij's style to
> > include it by default.
> >
> > @Override is used by javac to enforce that a method is in fact overriding
> > from an abstract class or an interface and if this stops being true (such
> > as a refactor happens) then a compiler error is thrown; when we default
> to
> > excluding, it makes it harder to detect that a refactor catches all
> > implementations and can lead to subtle and hard to track down bugs.
> >
> > This proposal is for new code and would not be to go rewrite all code at
> > once, but would recommend new code adopt this style, and to pull old code
> > forward which is related to changes being made (similar to our stance on
> > imports).
> >
> > If people are ok with this, I will file a JIRA, update the docs, and
> > update intellij's formatting.
> >
> > Thanks for your time!
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-31 Thread Jasonstack Zhao Yang
Thank you, Charles and Patrick

On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:

> Thank you, Patrick!
>
> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
> wrote:
> >
> > I just moved it to 8AM for this meeting to better accommodate APAC.
> Please
> > see the update here:
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> >
> > Patrick
> >
> > On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
> wrote:
> >
> > > Patrick,
> > >
> > > 11AM PST is a bad time for the people in the APAC timezone. Can we
> > > move it to 7 or 8AM PST in the morning to accommodate their needs ?
> > >
> > > ~Charles
> > >
> > > On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin 
> > > wrote:
> > > >
> > > > Meeting scheduled.
> > > >
> > >
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> > > >
> > > > Tuesday September 1st, 11AM PST. I added a basic bullet for the
> agenda
> > > but
> > > > if there is more, edit away.
> > > >
> > > > Patrick
> > > >
> > > > On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
> > > > jasonstack.z...@gmail.com> wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
> > > e.dimitr...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
> > > calebrackli...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
> pmcfa...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > This is related to the discussion Jordan and I had about the
> > > > > > contributor
> > > > > > >
> > > > > > > > Zoom call. Instead of open mic for any issue, call it based
> on a
> > > > > > > discussion
> > > > > > >
> > > > > > > > thread or threads for higher bandwidth discussion.
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > I would be happy to schedule on for next week to specifically
> > > discuss
> > > > > > >
> > > > > > > > CEP-7. I can attach the recorded call to the CEP after.
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > +1 or -1?
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > Patrick
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie <
> > > > > jmcken...@apache.org>
> > > > > > >
> > > > > > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > > > > > > Does community plan to open another discussion or CEP on
> > > > > > >
> > > > > > > > modularization?
> > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > > > > We probably should have a discussion on the ML or monthly
> > > contrib
> > > > > > call
> > > > > > >
> > > > > > > > > about it first to see how aligned the interested
> contributors
> > > are.
> > > > > > > Could
> > > > > > >
> > > > > > > > do
> > > > > > >
> > > > > > > > > that through CEP as well but CEP's (at least thus far sans
> k8s
> > > > &

Re: [VOTE] Release Apache Cassandra 4.0-beta2

2020-08-28 Thread Jasonstack Zhao Yang
+1

On Sat, 29 Aug 2020 at 00:28, Joshua McKenzie  wrote:

> +1
>
> On Fri, Aug 28, 2020 at 11:48 AM Brandon Williams 
> wrote:
>
> > +1
> >
> > On Fri, Aug 28, 2020, 9:19 AM Mick Semb Wever  wrote:
> >
> > > Proposing the test build of Cassandra 4.0-beta2 for release.
> > >
> > > sha1: 56eadf2004399a80f0733041cacf03839832249a
> > > Git:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-beta2-tentative
> > > Maven Artifacts:
> > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1218/org/apache/cassandra/cassandra-all/4.0-beta2/
> > >
> > > The Source and Build Artifacts, and the Debian and RPM packages and
> > > repositories, are available here:
> > > https://dist.apache.org/repos/dist/dev/cassandra/4.0-beta2/
> > >
> > > The vote will be open for 72 hours (longer if needed). Everyone who has
> > > tested the build is invited to vote. Votes by PMC members are
> considered
> > > binding. A vote passes if there are at least three binding +1s and no
> > -1's.
> > >
> > > [1]: CHANGES.txt:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-beta2-tentative
> > > [2]: NEWS.txt:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-beta2-tentative
> > >
> >
>


Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-27 Thread Jasonstack Zhao Yang
+1

On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova 
wrote:

> +1
>
> On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe 
> wrote:
>
> > +1
> >
> >
> >
> > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin 
> wrote:
> >
> >
> >
> > > This is related to the discussion Jordan and I had about the
> contributor
> >
> > > Zoom call. Instead of open mic for any issue, call it based on a
> > discussion
> >
> > > thread or threads for higher bandwidth discussion.
> >
> > >
> >
> > > I would be happy to schedule on for next week to specifically discuss
> >
> > > CEP-7. I can attach the recorded call to the CEP after.
> >
> > >
> >
> > > +1 or -1?
> >
> > >
> >
> > > Patrick
> >
> > >
> >
> > > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie 
> >
> > > wrote:
> >
> > >
> >
> > > > >
> >
> > > > > Does community plan to open another discussion or CEP on
> >
> > > modularization?
> >
> > > >
> >
> > > > We probably should have a discussion on the ML or monthly contrib
> call
> >
> > > > about it first to see how aligned the interested contributors are.
> > Could
> >
> > > do
> >
> > > > that through CEP as well but CEP's (at least thus far sans k8s
> > operator)
> >
> > > > tend to start with a strong, deeply thought out point of view being
> >
> > > > expressed.
> >
> > > >
> >
> > > > On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang <
> >
> > > > jasonstack.z...@gmail.com> wrote:
> >
> > > >
> >
> > > > > >>> SASI's performance, specifically the search in the B+ tree
> >
> > > component,
> >
> > > > > >>> depends a lot on the component file's header being available in
> > the
> >
> > > > > >>> pagecache. SASI benefits from (needs) nodes with lots of RAM.
> Is
> >
> > > SAI
> >
> > > > > bound
> >
> > > > > >>> to this same or similar limitation?
> >
> > > > >
> >
> > > > > SAI also benefits from larger memory because SAI puts block info on
> >
> > > heap
> >
> > > > > for searching on-disk components and having cross-index files on
> page
> >
> > > > cache
> >
> > > > > improves read performance of different indexes on the same table.
> >
> > > > >
> >
> > > > >
> >
> > > > > >>> Flushing of SASI can be CPU+IO intensive, to the point of
> >
> > > saturation,
> >
> > > > > >>> pauses, and crashes on the node. SSDs are a must, along with a
> > bit
> >
> > > of
> >
> > > > > >>> tuning, just to avoid bringing down your cluster. Beyond
> reducing
> >
> > > > space
> >
> > > > > >>> requirements, does SAI improve on these things? Like SASI how
> > does
> >
> > > > SAI,
> >
> > > > > in
> >
> > > > > >>> its own way, change/narrow the recommendations on node hardware
> >
> > > > specs?
> >
> > > > >
> >
> > > > > SAI won't crash the node during compaction and requires less
> CPU/IO.
> >
> > > > >
> >
> > > > > * SAI defines global memory limit for compaction instead of
> per-index
> >
> > > > > memory limit used by SASI.
> >
> > > > >   For example, compactions are running on 10 tables and each has 10
> >
> > > > > indexes. SAI will cap the
> >
> > > > >   memory usage with global limit while SASI may use up to 100 *
> >
> > > per-index
> >
> > > > > limit.
> >
> > > > >
> >
> > > > > * After flushing in-memory segments to disk, SAI won't merge
> on-disk
> >
> > > > > segments while SASI
> >
> > > > >   attempts to merge them at the end.
> >
> > > > >
> >
> > > > >   There are pros and cons of not merging segments:
> >
> > > > > ** Pros: compaction runs faster and r

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-25 Thread Jasonstack Zhao Yang
>>> SASI's performance, specifically the search in the B+ tree component,
>>> depends a lot on the component file's header being available in the
>>> pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI
bound
>>> to this same or similar limitation?

SAI also benefits from larger memory because SAI puts block info on heap
for searching on-disk components and having cross-index files on page cache
improves read performance of different indexes on the same table.


>>> Flushing of SASI can be CPU+IO intensive, to the point of saturation,
>>> pauses, and crashes on the node. SSDs are a must, along with a bit of
>>> tuning, just to avoid bringing down your cluster. Beyond reducing space
>>> requirements, does SAI improve on these things? Like SASI how does SAI,
in
>>> its own way, change/narrow the recommendations on node hardware specs?

SAI won't crash the node during compaction and requires less CPU/IO.

* SAI defines global memory limit for compaction instead of per-index
memory limit used by SASI.
  For example, compactions are running on 10 tables and each has 10
indexes. SAI will cap the
  memory usage with global limit while SASI may use up to 100 * per-index
limit.

* After flushing in-memory segments to disk, SAI won't merge on-disk
segments while SASI
  attempts to merge them at the end.

  There are pros and cons of not merging segments:
** Pros: compaction runs faster and requires fewer resources.
** Cons: small segments reduce compression ratio.

* SAI on-disk format with row ids compresses better.


>>> I understand the desire in keeping out of scope the longer term
deprecation
>>> and migration plan, but… if SASI provides functionality that SAI
doesn't,
>>> like tokenisation and DelimiterAnalyzer, yet introduces a body of code
>>> ~somewhat similar, shouldn't we be roughly sketching out how to reduce
the
>>> maintenance surface area?

Agreed that we should reduce maintenance area if possible, but only very
limited
code base (eg. RangeIterator, QueryPlan) can be shared. The rest of the
code base
is quite different because of on-disk format and cross-index files.

The goal of this CEP is to get community buy-in on SAI's design.
Tokenization,
DelimiterAnalyzer should be straightforward to implement on top of SAI.

>>> Can we list what configurations of SASI will become deprecated once SAI
>>> becomes non-experimental?

Except for "Like", "Tokenisation", "DelimiterAnalyzer", the rest of SASI can
be replaced by SAI.

>>> Given a few bugs are open against 2i and SASI, can we provide some
>>> overview, or rough indication, of how many of them we could "triage
away"?

I believe most of the known bugs in 2i/SASI either have been addressed in
SAI or
don't apply to SAI.

>>> And, is it time for the project to start introducing new SPI
>>> implementations as separate sub-modules and jar files that are only
loaded
>>> at runtime based on configuration settings? (sorry for the conflation on
>>> this one, but maybe it's the right time to raise it :shrug:)

Agreed that modularization is the way to go and will speed up module
development speed.

Does community plan to open another discussion or CEP on modularization?


On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever  wrote:

> Adding to Duy's questions…
>
>
> * Hardware specs
>
> SASI's performance, specifically the search in the B+ tree component,
> depends a lot on the component file's header being available in the
> pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI bound
> to this same or similar limitation?
>
> Flushing of SASI can be CPU+IO intensive, to the point of saturation,
> pauses, and crashes on the node. SSDs are a must, along with a bit of
> tuning, just to avoid bringing down your cluster. Beyond reducing space
> requirements, does SAI improve on these things? Like SASI how does SAI, in
> its own way, change/narrow the recommendations on node hardware specs?
>
>
> * Code Maintenance
>
> I understand the desire in keeping out of scope the longer term deprecation
> and migration plan, but… if SASI provides functionality that SAI doesn't,
> like tokenisation and DelimiterAnalyzer, yet introduces a body of code
> ~somewhat similar, shouldn't we be roughly sketching out how to reduce the
> maintenance surface area?
>
> Can we list what configurations of SASI will become deprecated once SAI
> becomes non-experimental?
>
> Given a few bugs are open against 2i and SASI, can we provide some
> overview, or rough indication, of how many of them we could "triage away"?
>
> And, is it time for the project to start introducing new SPI
> implementations as separate sub-modules and jar files that are only loaded
> at runtime based on configuration settings? (sorry for the conflation on
> this one, but maybe it's the right time to raise it :shrug:)
>
> regards,
> Mick
>
>
> On Tue, 18 Aug 2020 at 13:05, DuyHai Doan  wrote:
>
> > Thank you Zhao Yang for starting this topic
> >
> > After reading the short design doc, I have 

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-24 Thread Jasonstack Zhao Yang
> I think the project needs to conclude the discussions that keep being
started around the "definition of done" before determining what sufficient
quality assurance looks like for this feature.

Looking forward to the Test/QA guideline. Thanks for bringing this up.


> the CEP process suggest a wiki page

Added CEP-7 SAI cwiki:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index

On Sat, 22 Aug 2020 at 01:01, Jason Rutherglen 
wrote:

> > About space efficiency, one of the biggest drawback of SASI was the huge
> space required for index structure when using CONTAINS logic because of the
> decomposition of text columns into n-grams. Will SAI suffer from the same
> issue in future iterations ?
>
> SAI does not have specific ngram support atm, though that may be added
> with tokenizers.  Ngrams do indeed grow the index, that's a user
> decision for faster queries or more disk space.
>
> On Tue, Aug 18, 2020 at 6:05 AM DuyHai Doan  wrote:
> >
> > Thank you Zhao Yang for starting this topic
> >
> > After reading the short design doc, I have a few questions
> >
> > 1) SASI was pretty inefficient indexing wide partitions because the index
> > structure only retains the partition token, not the clustering colums. As
> > per design doc SAI has row id mapping to partition offset, can we hope
> that
> > indexing wide partition will be more efficient with SAI ? One detail that
> > worries me is that in the beggining of the design doc, it is said that
> the
> > matching rows are post filtered while scanning the partition. Can you
> > confirm or infirm that SAI is efficient with wide partitions and provides
> > the partition offsets to the matching rows ?
> >
> > 2) About space efficiency, one of the biggest drawback of SASI was the
> huge
> > space required for index structure when using CONTAINS logic because of
> the
> > decomposition of text columns into n-grams. Will SAI suffer from the same
> > issue in future iterations ? I'm anticipating a bit
> >
> > 3) If I'm querying using SAI and providing complete partition key, will
> it
> > be more efficient than querying without partition key. In other words,
> does
> > SAI provide any optimisation when partition key is specified ?
> >
> > Regards
> >
> > Duy Hai DOAN
> >
> > Le mar. 18 août 2020 à 11:39, Mick Semb Wever  a écrit :
> >
> > > >
> > > > We are looking forward to the community's feedback and suggestions.
> > > >
> > >
> > >
> > > What comes immediately to mind is testing requirements. It has been
> > > mentioned already that the project's testability and QA guidelines are
> > > inadequate to successfully introduce new features and refactorings to
> the
> > > codebase. During the 4.0 beta phase this was intended to be addressed,
> i.e.
> > > defining more specific QA guidelines for 4.0-rc. This would be an
> important
> > > step towards QA guidelines for all changes and CEPs post-4.0.
> > >
> > > Questions from me
> > >  - How will this be tested, how will its QA status and lifecycle be
> > > defined? (per above)
> > >  - With existing C* code needing to be changed, what is the proposed
> plan
> > > for making those changes ensuring maintained QA, e.g. is there
> separate QA
> > > cycles planned for altering the SPI before adding a new SPI
> implementation?
> > >  - Despite being out of scope, it would be nice to have some idea from
> the
> > > CEP author of when users might still choose afresh 2i or SASI over SAI,
> > >  - Who fills the roles involved? Who are the contributors in this
> DataStax
> > > team? Who is the shepherd? Are there other stakeholders willing to be
> > > involved?
> > >  - Is there a preference to use gdoc instead of the project's wiki, and
> > > why? (the CEP process suggest a wiki page, and feedback on why another
> > > approach is considered better helps evolve the CEP process itself)
> > >
> > > cheers,
> > > Mick
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-19 Thread Jasonstack Zhao Yang
Hi Duy, great questions.

> 1) SASI was pretty inefficient indexing wide partitions because the index
> structure only retains the partition token, not the clustering colums. As
> per design doc SAI has row id mapping to partition offset, can we hope
that
> indexing wide partition will be more efficient with SAI ? One detail that
> worries me is that in the beggining of the design doc, it is said that the
> matching rows are post filtered while scanning the partition. Can you
> confirm or infirm that SAI is efficient with wide partitions and provides
> the partition offsets to the matching rows ?

As of now, SAI indexes partition offset, same as SASI. But during design, we
have taken row-level-index into consideration and row-awareness is being
prototyped.

For the record, partition-level indexing works nicely when most rows in the
wide
partition match indexed value. After switching to row-level-index, when
matching
most rows in wide partition, the index engine needs to fall back to
partition-level
index behavior (scanning entire partition + post-filter) instead of
fetching single
rows many times.

> 2) About space efficiency, one of the biggest drawback of SASI was the
huge
> space required for index structure when using CONTAINS logic because of
the
> decomposition of text columns into n-grams. Will SAI suffer from the same
> issue in future iterations ? I'm anticipating a bit

Tokenization wasn't part of the CEP scope.

Off the top of my head, I think tokenization did require more space, as
both SAI and SASI
need to store matches for every decomposed value. But with
frame-of-reference encoding
on row ids, SAI should require less disk space than SASI.

> 3) If I'm querying using SAI and providing complete partition key, will it
> be more efficient than querying without partition key. In other words,
does
> SAI provide any optimisation when partition key is specified ?

Yes.

* On coordinator, it will find replicas with PK.
* On replica side:
 - it will skip to given PK token
 - there is some pruning based on min/max key of index segments.

> 4) Are collections, static columns, composite partition key composent and
> UDT indexings (at any depth) on the roadmap of SAI ? I strongly believe
> that those features are the bare minimum to make SAI an interesting
> replacement for the native 2nd index as well as SASI. SASI limited support
> for those advanced data structures has hindered its wide adoption (among
> other issues and bugs)

Collections, static columns, composite partition key are supported.

I think "UDT indexings (at any depth)" can be added because there is no
architectural limitation on SAI or SASI.

I have invited you to slack #cassandra-sai, really appreciate your
participation.


On Tue, 18 Aug 2020 at 19:33, DuyHai Doan  wrote:

> Last but not least
>
> 4) Are collections, static columns, composite partition key composent and
> UDT indexings (at any depth) on the roadmap of SAI ? I strongly believe
> that those features are the bare minimum to make SAI an interesting
> replacement for the native 2nd index as well as SASI. SASI limited support
> for those advanced data structures has hindered its wide adoption (among
> other issues and bugs)
>
> Regards
>
> Duy Hai DOAN
>
> Le mar. 18 août 2020 à 13:02, Jasonstack Zhao Yang <
> jasonstack.z...@gmail.com> a écrit :
>
> > Mick thanks for your questions.
> >
> > > During the 4.0 beta phase this was intended to be addressed, i.e.>
> > defining more specific QA guidelines for 4.0-rc. This would be an
> important
> > > step towards QA guidelines for all changes and CEPs post-4.0.
> >
> > Agreed, I think CASSANDRA-15536
> > <https://issues.apache.org/jira/browse/CASSANDRA-15536> (4.0 Quality:
> > Components and Test Plans) has set a good example of QA/Testing.
> >
> > >  - How will this be tested, how will its QA status and lifecycle be>
> > defined? (per above)
> >
> > SAI will follow the same QA/Testing guideline as in CASSANDRA-15536.
> >
> > >  - With existing C* code needing to be changed, what is the proposed
> > plan> for making those changes ensuring maintained QA, e.g. is there
> > separate QA
> > > cycles planned for altering the SPI before adding a new SPI
> > implementation?
> >
> > The plan is to have interface changes and their new implementations to be
> > reviewed/tested/merged at once to reduce overhead.
> >
> > But if having interface changes reviewed/tested/merged separately helps
> > quality, I don't think anyone will object.
> >
> > > - Despite being out of scope, it would be nice to have some idea from
> > the>  CEP author of when users might still choose afresh 2i or SASI 

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-18 Thread Jasonstack Zhao Yang
Mick thanks for your questions.

> During the 4.0 beta phase this was intended to be addressed, i.e.>
defining more specific QA guidelines for 4.0-rc. This would be an important
> step towards QA guidelines for all changes and CEPs post-4.0.

Agreed, I think CASSANDRA-15536
 (4.0 Quality:
Components and Test Plans) has set a good example of QA/Testing.

>  - How will this be tested, how will its QA status and lifecycle be>
defined? (per above)

SAI will follow the same QA/Testing guideline as in CASSANDRA-15536.

>  - With existing C* code needing to be changed, what is the proposed
plan> for making those changes ensuring maintained QA, e.g. is there
separate QA
> cycles planned for altering the SPI before adding a new SPI
implementation?

The plan is to have interface changes and their new implementations to be
reviewed/tested/merged at once to reduce overhead.

But if having interface changes reviewed/tested/merged separately helps
quality, I don't think anyone will object.

> - Despite being out of scope, it would be nice to have some idea from
the>  CEP author of when users might still choose afresh 2i or SASI over SAI

I'd like SAI to be the only index for users, but this is a decision to be
made by the community.

> - Who fills the roles involved?

Contributors that are still active on C* or related projects:

Andres de la Peña
Caleb Rackliffe
Dan LaRocque
Jason Rutherglen
Mike Adamson
Rocco Varela
Zhao Yang

I will shepherd.

Anyone that is interested in C* index, feel free to join us at slack
#cassandra-sai.

> - Is there a preference to use gdoc instead of the project's wiki, and>
why? (the CEP process suggest a wiki page, and feedback on why another
> approach is considered better helps evolve the CEP process itself)

Didn't notice wiki is required. Will port CEP to wiki.


On Tue, 18 Aug 2020 at 17:39, Mick Semb Wever  wrote:

> >
> > We are looking forward to the community's feedback and suggestions.
> >
>
>
> What comes immediately to mind is testing requirements. It has been
> mentioned already that the project's testability and QA guidelines are
> inadequate to successfully introduce new features and refactorings to the
> codebase. During the 4.0 beta phase this was intended to be addressed, i.e.
> defining more specific QA guidelines for 4.0-rc. This would be an important
> step towards QA guidelines for all changes and CEPs post-4.0.
>
> Questions from me
>  - How will this be tested, how will its QA status and lifecycle be
> defined? (per above)
>  - With existing C* code needing to be changed, what is the proposed plan
> for making those changes ensuring maintained QA, e.g. is there separate QA
> cycles planned for altering the SPI before adding a new SPI implementation?
>  - Despite being out of scope, it would be nice to have some idea from the
> CEP author of when users might still choose afresh 2i or SASI over SAI,
>  - Who fills the roles involved? Who are the contributors in this DataStax
> team? Who is the shepherd? Are there other stakeholders willing to be
> involved?
>  - Is there a preference to use gdoc instead of the project's wiki, and
> why? (the CEP process suggest a wiki page, and feedback on why another
> approach is considered better helps evolve the CEP process itself)
>
> cheers,
> Mick
>


[DISCUSS] CEP-7 Storage Attached Index

2020-08-17 Thread Jasonstack Zhao Yang
Hi,

As per the CEP guideline, I am sending this email to start a discussion
about Storage-Attached-Index[1][2] for Apache Cassandra.

A team at DataStax has developed a new index implementation, called Storage
Attached Index(SAI), based on the advancement made by SASI. SAI improves:

* disk usage by sharing of common data between multiple column indexes on
the same table and better compression of on-disk structures.
* numeric range query performance with modified KDTree and collection type
support.
* compaction performance and stability for larger data set.

There is a more detailed explanation about SAI design in the CEP document.
To make
the technical discussion simpler, we created a slack channel #cassandra-sai.

We are looking forward to the community's feedback and suggestions.


Regards,

Zhao Yang


[1]
https://docs.google.com/document/d/1V830eAMmQAspjJdjviVZIaSolVGvZ1hVsqOLWyV0DS4/edit#heading=h.cgm22puztagk

[2] https://issues.apache.org/jira/browse/CASSANDRA-16052


Re: Media coordination (was: [VOTE] Release Apache Cassandra 4.0-beta1)

2020-07-21 Thread Jasonstack Zhao Yang
sorry, my phone got unlocked accidentally in my pocket. please ignore the
empty email.

On Tue, 21 Jul 2020 at 18:40, Jasonstack Zhao Yang <
jasonstack.z...@gmail.com> wrote:

>
> Blake Eggleston  于 2020年7月21日周二 01:57写道:
>
>> Characterizing alternate or conflicting points of view as assuming bad
>> intentions without justification is both unproductive and unhealthy for the
>> project.
>>
>> > On Jul 20, 2020, at 9:14 AM, Joshua McKenzie 
>> wrote:
>> >
>> > This kind of back and forth isn't productive for the project so I'm not
>> > taking this discussion further. Just want to call it out here so you or
>> > others aren't left waiting for a reply.
>> >
>> > We can agree to disagree.
>> >
>> > On Mon, Jul 20, 2020 at 11:59 AM Benedict Elliott Smith <
>> bened...@apache.org>
>> > wrote:
>> >
>> >> Firstly, that is a very strong claim that in this particular case is
>> >> disputed by the facts.  You made a very specific claim that the delay
>> was
>> >> "risking our currently lined up coordination with journalists and other
>> >> channels". I am not the only person to interpret this as implying
>> >> coordination with journalists, contingent on a release schedule not
>> agreed
>> >> by the PMC.  This was based on semantics only; as far as I can tell, no
>> >> intentions or assumptions have entered into this debate, except on your
>> >> part.
>> >>
>> >>> Which is the definition of not assuming positive intent.
>> >>
>> >> Secondly, this is not the definition of positive intent.  Positive
>> intent
>> >> only indicates that you "mean well"
>> >>
>> >> Thirdly, in many recent disputes about governance, you have made a
>> >> negative claim about my behaviour, or ascribed negative connotations to
>> >> statements I have made; this is a very thinly veiled example, as I am
>> >> clearly the object of this criticism.  I think it has reached a point
>> where
>> >> I can perhaps legitimately claim that you are not assuming positive
>> intent?
>> >>
>> >>> motives, incentives ... little to do with reality
>> >>
>> >> It feels like we should return to this earlier discussion, since you
>> >> appear to feel it is incomplete?  At the very least you seem to have
>> taken
>> >> the wrong message from my statements, and it is perhaps negatively
>> >> colouring our present interactions.
>> >>
>> >>
>> >> On 20/07/2020, 15:59, "Joshua McKenzie"  wrote:
>> >>
>> >>>
>> >>> If you are criticised, it is often because of the action you took;
>> >>
>> >>Actually, in this case and many others it's because of people's
>> >> unfounded
>> >>assumptions about motives, incentives, and actions taken and has
>> >> little to
>> >>do with reality. Which is the definition of not assuming positive
>> >> intent.
>> >>
>> >>On Mon, Jul 20, 2020 at 10:41 AM Benedict Elliott Smith <
>> >> bened...@apache.org>
>> >>wrote:
>> >>
>> >>> Thanks Sally, really appreciate your insight.
>> >>>
>> >>> To respond to the community discourse around this:
>> >>>
>> >>>> Keep your announcement plans ... private: limit discussions to the
>> >> PMC
>> >>>
>> >>> This is all that I was asking and expecting: if somebody is making
>> >>> commitments on behalf of the community (such as that a release can be
>> >>> expected on day X), this should be coordinated with the PMC.  While
>> >> it
>> >>> seems to transpire that no such commitments were made, had they been
>> >> made
>> >>> without the knowledge of the PMC this would in my view be
>> >> problematic.
>> >>> This is not at all like development work, as has been alleged, since
>> >> that
>> >>> only takes effect after public agreement by the community.
>> >>>
>> >>> IMO, in general, public engagements should be run past the PMC as a
>> >> final
>> >>> pre-flight check regardless of any commitment being made, as the PMC
>> >> should
>> >>> have visibility into these activities

Re: Media coordination (was: [VOTE] Release Apache Cassandra 4.0-beta1)

2020-07-21 Thread Jasonstack Zhao Yang
Blake Eggleston  于 2020年7月21日周二 01:57写道:

> Characterizing alternate or conflicting points of view as assuming bad
> intentions without justification is both unproductive and unhealthy for the
> project.
>
> > On Jul 20, 2020, at 9:14 AM, Joshua McKenzie 
> wrote:
> >
> > This kind of back and forth isn't productive for the project so I'm not
> > taking this discussion further. Just want to call it out here so you or
> > others aren't left waiting for a reply.
> >
> > We can agree to disagree.
> >
> > On Mon, Jul 20, 2020 at 11:59 AM Benedict Elliott Smith <
> bened...@apache.org>
> > wrote:
> >
> >> Firstly, that is a very strong claim that in this particular case is
> >> disputed by the facts.  You made a very specific claim that the delay
> was
> >> "risking our currently lined up coordination with journalists and other
> >> channels". I am not the only person to interpret this as implying
> >> coordination with journalists, contingent on a release schedule not
> agreed
> >> by the PMC.  This was based on semantics only; as far as I can tell, no
> >> intentions or assumptions have entered into this debate, except on your
> >> part.
> >>
> >>> Which is the definition of not assuming positive intent.
> >>
> >> Secondly, this is not the definition of positive intent.  Positive
> intent
> >> only indicates that you "mean well"
> >>
> >> Thirdly, in many recent disputes about governance, you have made a
> >> negative claim about my behaviour, or ascribed negative connotations to
> >> statements I have made; this is a very thinly veiled example, as I am
> >> clearly the object of this criticism.  I think it has reached a point
> where
> >> I can perhaps legitimately claim that you are not assuming positive
> intent?
> >>
> >>> motives, incentives ... little to do with reality
> >>
> >> It feels like we should return to this earlier discussion, since you
> >> appear to feel it is incomplete?  At the very least you seem to have
> taken
> >> the wrong message from my statements, and it is perhaps negatively
> >> colouring our present interactions.
> >>
> >>
> >> On 20/07/2020, 15:59, "Joshua McKenzie"  wrote:
> >>
> >>>
> >>> If you are criticised, it is often because of the action you took;
> >>
> >>Actually, in this case and many others it's because of people's
> >> unfounded
> >>assumptions about motives, incentives, and actions taken and has
> >> little to
> >>do with reality. Which is the definition of not assuming positive
> >> intent.
> >>
> >>On Mon, Jul 20, 2020 at 10:41 AM Benedict Elliott Smith <
> >> bened...@apache.org>
> >>wrote:
> >>
> >>> Thanks Sally, really appreciate your insight.
> >>>
> >>> To respond to the community discourse around this:
> >>>
>  Keep your announcement plans ... private: limit discussions to the
> >> PMC
> >>>
> >>> This is all that I was asking and expecting: if somebody is making
> >>> commitments on behalf of the community (such as that a release can be
> >>> expected on day X), this should be coordinated with the PMC.  While
> >> it
> >>> seems to transpire that no such commitments were made, had they been
> >> made
> >>> without the knowledge of the PMC this would in my view be
> >> problematic.
> >>> This is not at all like development work, as has been alleged, since
> >> that
> >>> only takes effect after public agreement by the community.
> >>>
> >>> IMO, in general, public engagements should be run past the PMC as a
> >> final
> >>> pre-flight check regardless of any commitment being made, as the PMC
> >> should
> >>> have visibility into these activities and have the opportunity to
> >> influence
> >>> them.
> >>>
>  There has been nothing about this internally at DS
> >>>
> >>> I would ask that you refrain from making such claims, unless you can
> >> be
> >>> certain that you would have been privy to all such internal
> >> discussions.
> >>>
>  there's really no reason not to assume best intentions here
> >>>
> >>> This is a recurring taking point, that I wish we would retire except
> >> where
> >>> a clear assumption of bad faith has been made.  If you are
> >> criticised, it
> >>> is often because of the action you took; any intention you had may be
> >>> irrelevant to the criticism.  In this case, when you act on behalf
> >> of the
> >>> community, your intentions are insufficient: you must have the
> >> community's
> >>> authority to act.
> >>>
> >>>
> >>> On 20/07/2020, 14:00, "Sally Khudairi"  wrote:
> >>>
> >>>Hello everyone --Mick pinged me about this; I wanted to respond
> >>> on-list for efficacy.
> >>>
> >>>We've had dozens of companies successfully help Apache Projects
> >> and
> >>> their communities help spread the word on their projects with their
> >> PR and
> >>> marketing teams. Here are some best practices:
> >>>
> >>>1) Timing. Ensure that the Project has announced the project
> >> milestone
> >>> first to their lists as well as announce@ before any media coverage
> >> takes
> >>> place. If you'

Re: [VOTE] Release Apache Cassandra 4.0-beta1 (take2)

2020-07-20 Thread Jasonstack Zhao Yang
+1 (nb)

On Sun, 19 Jul 2020 at 07:01, Ekaterina Dimitrova 
wrote:

> +1(nb)
>
> On Sat, 18 Jul 2020 at 18:13, Jeff Jirsa  wrote:
>
> >
> >
> > +1
> >
> > > On Jul 17, 2020, at 4:28 PM, Mick Semb Wever  wrote:
> > >
> > > Proposing the test build of Cassandra 4.0-beta1 for release.
> > >
> > > sha1: 972da6fcffa87b3a1684362a2bab97db853372d8
> > > Git:
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-beta1-tentative
> > > Maven Artifacts:
> > >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1211/org/apache/cassandra/cassandra-all/4.0-beta1/
> > >
> > > The Source and Build Artifacts, and the Debian and RPM packages and
> > > repositories, are available here:
> > > https://dist.apache.org/repos/dist/dev/cassandra/4.0-beta1/
> > >
> > > The vote will be open for 60 hours (longer if needed). I've taken 12
> > hours
> > > off the normal 72 hours and this follows closely after the initial
> > > 4.0-beta1 vote. Everyone who has tested the build is invited to vote.
> > Votes
> > > by PMC members are considered binding. A vote passes if there are at
> > least
> > > three binding +1s and no -1s.
> > >
> > > Eventual publishing and announcement of the 4.0-beta1 release will be
> > > coordinated, as described in
> > >
> >
> https://lists.apache.org/thread.html/r537fe799e7d5e6d72ac791fdbe9098ef0344c55400c7f68ff65abe51%40%3Cdev.cassandra.apache.org%3E
> > >
> > > [1]: CHANGES.txt:
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-beta1-tentative
> > > [2]: NEWS.txt:
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-beta1-tentative
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: [VOTE] Release Apache Cassandra 4.0-beta1

2020-07-15 Thread Jasonstack Zhao Yang
+1 (nb)

On Thu, 16 Jul 2020 at 01:28, Brandon Williams  wrote:

> +1 (binding)
>
> On Tue, Jul 14, 2020, 6:06 PM Mick Semb Wever  wrote:
>
> > Proposing the test build of Cassandra 4.0-beta1 for release.
> >
> > sha1: 5e767711360ecc4bc05a7cd219f0e680bfada004
> > Git:
> >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-beta1-tentative
> > Maven Artifacts:
> >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1210/org/apache/cassandra/cassandra-all/4.0-beta1/
> >
> > The Source and Build Artifacts, and the Debian and RPM packages and
> > repositories, are available here:
> > https://dist.apache.org/repos/dist/dev/cassandra/4.0-beta1/
> >
> > The vote will be open for 72 hours (longer if needed). Everyone who has
> > tested the build is invited to vote. Votes by PMC members are considered
> > binding. A vote passes if there are at least three binding +1s and no
> -1s.
> >
> > Eventual publishing and announcement of the 4.0-beta1 release will be
> > coordinated, as described in
> >
> >
> https://lists.apache.org/thread.html/r537fe799e7d5e6d72ac791fdbe9098ef0344c55400c7f68ff65abe51%40%3Cdev.cassandra.apache.org%3E
> >
> > [1]: CHANGES.txt:
> >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-beta1-tentative
> > [2]: NEWS.txt:
> >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-beta1-tentative
> >
>


Re: [DISCUSS] Future of MVs

2020-07-01 Thread Jasonstack Zhao Yang
> I agree with Jeff that there is some stuff to do to address the current MV
> issues and I am willing to focus on making them production ready.

+1

On Wed, 1 Jul 2020 at 15:42, Benjamin Lerer 
wrote:

> >
> > "Make the scan faster"
> > "Make the scan incremental and automatic"
> > "Make it not blow up your page cache"
> > "Make losing your base replicas less likely".
> >
> > There's a concrete, real opportunity with MVs to create integrity
> > assertions we're missing. A dangling record from an MV that would point
> to
> > missing base data is something that could raise alarm bells and signal
> > JIRAs so we can potentially find and fix more surprise edge cases.
> >
>
> I agree with Jeff that there is some stuff to do to address the current MV
> issues and I am willing to focus on making them production ready.
>
>
>
>
> On Wed, Jul 1, 2020 at 2:58 AM  wrote:
>
> > It would be incredibly helpful for us to have some empirical data and
> > agreed upon terms and benchmarks to help us navigate discussions like
> this:
> >
> >   * How widely used is a feature  in C* deployments worldwide?
> >   * What are the primary issues users face when deploying them? Scaling
> > them? During failure scenarios?
> >   * What does the engineering effort to bridge these gaps look like? Who
> > will do that? On what time horizon?
> >   * What does our current test coverage for this feature look like?
> >   * What shape of defects are arising with the feature? In a specific
> > subsection of the module or usage?
> >   * Do we have an agreed upon set of standards for labeling a feature
> > stable? As experimental? If not, how do we get there?
> >   * What effort will it take to bridge from where we are to where we
> agree
> > we need to be? On what timeline is this acceptable?
> >
> > I believe these are not only answerable questions, but fundamentally the
> > underlying themes our discussion alludes to. They’re also questions that
> > apply to a lot more than just MV’s and tie into what you’re speaking to
> > above Benedict.
> >
> >
> > > On Jun 30, 2020, at 8:32 PM, sankalp kohli 
> > wrote:
> > >
> > > I see this discussion as several decisions which can be made in small
> > > increments.
> > >
> > > 1. In release cycles, when can we propose a feature to be deprecated or
> > > marked experimental. Ideally a new feature should come out experimental
> > if
> > > required but we have several who are candidates now. We can work on
> > > integrating this in the release lifecycle doc we already have.
> > > 2. What is the process of making an existing feature experimental? How
> > does
> > > it affect major releases around testing.
> > > 3. What is the process of deprecating/removing an experimental feature.
> > > (Assuming experimental features should be deprecated/removed)
> > >
> > > Coming to MV, I think we need more data before we can say we
> > > should deprecate MV. Here are some of them which should be part of
> > > deprecation process
> > > 1.Talk to customers who use them and understand what is the impact.
> Give
> > > them a forum to talk about it.
> > > 2. Do we have enough resources to bring this feature out of the
> > > experimental feature list in next 1 or 2 major releases. We cannot have
> > too
> > > many experimental features in the database. Marking a feature
> > experimental
> > > should not be a parking place for a non functioning feature but a place
> > > while we stabilize it.
> > >
> > >
> > >
> > >
> > >> On Tue, Jun 30, 2020 at 4:52 PM  wrote:
> > >>
> > >> I followed up with the clarification about unit and dtests for that
> > reason
> > >> Dinesh. We test experimental features now.
> > >>
> > >> If we’re talking about adding experimental features to the 40 quality
> > >> testing effort, how does that differ from just saying “we won’t
> release
> > >> until we’ve tested and stabilized these features and they’re no longer
> > >> experimental”?
> > >>
> > >> Maybe I’m just misunderstanding something here?
> > >>
> >  On Jun 30, 2020, at 7:12 PM, Dinesh Joshi 
> wrote:
> > >>>
> > >>> 
> > 
> >  On Jun 30, 2020, at 4:05 PM, Brandon Williams 
> > wrote:
> > 
> >  Instead of ripping it out, we could instead disable them in the yaml
> >  with big fat warning comments around it.  That way people already
> >  using them can just enable them again, but it will raise the bar for
> >  new users who ignore/miss the warnings in the logs and just use
> them.
> > >>>
> > >>> Not a bad idea. Although, the real issue is that users enable MV on
> a 3
> > >> node cluster with a few megs of data and conclude that MVs will
> > >> horizontally scale with the size of data. This is what causes issues
> for
> > >> users who naively roll it out in production and discover that MVs do
> not
> > >> scale with their data growth. So whatever we do, the big fat warning
> > should
> > >> educate the unsuspecting operator.
> > >>>
> > >>> Dinesh
> > >>> 

Re: [DISCUSS] Future of MVs

2020-06-30 Thread Jasonstack Zhao Yang
> While at TLP, I helped numerous customers move off of MVs, mostly because
> they affected stability of clusters in a horrific way.  The most telling
> project involved helping someone create new tables to manage 1GB of data
> because the views performed so poorly they made the cluster unresponsive
> and unusable.

The documented way to report bugs:
https://cassandra.apache.org/doc/latest/bugs.html#

with JIRA, Version, Environment.


> As we move forward with the 4.0 release, we should consider this an
opportunity to deprecate materialized views, and remove them in 5.0.

While the community is focusing on 4.0 and unable to review
CEP/Improvements,
should we discuss it when community is ready to discuss about
CEP/Improvements?


> We should take this opportunity to learn from the mistake and raise the
bar
> for new features to undergo a much more thorough run the wringer before
> merging.

Agreed to learn from mistakes, but there are still users using MV.
I think it's more responsible to work with users to improve MV on their use
cases.


>  Am I missing a JIRA
> that can magically fix the issues with performance, availability &
> correctness?

Is there any formal discussion/analysis about things being impossible to
fix/improve?

On Wed, 1 Jul 2020 at 04:23, Dinesh Joshi  wrote:

> > On Jun 30, 2020, at 12:43 PM, Jon Haddad  wrote:
> >
> > As we move forward with the 4.0 release, we should consider this an
> > opportunity to deprecate materialized views, and remove them in 5.0.  We
> > should take this opportunity to learn from the mistake and raise the bar
> > for new features to undergo a much more thorough run the wringer before
> > merging.
>
> I'm in favor of marking them as deprecated and removing them in 5.0. If
> someone steps up and can fix them in 5.0, then we always have the option of
> accepting the fix.
>
> Dinesh
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [VOTE] Project governance wiki doc (take 2)

2020-06-20 Thread Jasonstack Zhao Yang
+1 (nb)

On Sat, 20 Jun 2020 at 23:18, Jeff Jirsa  wrote:

> +1 (and present?)
>
>
> > On Jun 20, 2020, at 8:12 AM, Joshua McKenzie 
> wrote:
> >
> > Link to doc:
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Project+Governance
> >
> > Change since previous cancelled vote:
> > "A simple majority of this electorate becomes the low-watermark for votes
> > in favour necessary to pass a motion, with new PMC members added to the
> > calculation."
> >
> > This previously read "super majority". We have lowered the low water mark
> > to "simple majority" to balance strong consensus against risk of stall
> due
> > to low participation.
> >
> >
> >   - Vote will run through 6/24/20
> >   - pmc votes considered binding
> >   - simple majority of binding participants passes the vote
> >   - committer and community votes considered advisory
> >
> > Lastly, I propose we take the count of pmc votes in this thread as our
> > initial roll call count for electorate numbers and low watermark
> > calculation on subsequent votes.
> >
> > Thanks again everyone (and specifically Benedict and Jon) for the time
> and
> > collaboration on this.
> >
> > ~Josh
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Keeping test-only changes out of CHANGES.txt

2020-04-08 Thread Jasonstack Zhao Yang
+1

On Thu, Apr 9, 2020, 00:04 Aleksey Yeshchenko 
wrote:

> +1
>
> > On 8 Apr 2020, at 15:08, Mick Semb Wever  wrote:
> >
> > Can we agree on keeping such test changes out of CHANGES.txt ?
> >
> > We already don't put entries into CHANGES.txt if it is not a change
> > from any previous release.
> >
> > There was some discussion before¹ about this, and the problem that
> > being selective meant what ended up there being arbitrary. I think
> > this can be solved with an easy rule of thumb that if it only touches
> > *Test.java classes, or it is only about fixing a test, then it
> > shouldn't be in CHANGES.txt. That means if the patch does touch any
> > runtime code then you do still need to add an entry to CHANGES.txt.
> > This avoids the whole "arbitrary" problem,  and maintains CHANGES.txt
> > as user-facing formatted text to be searched through.
> >
> > If there's agreement I can commit to going through 4.0 changes and
> > removing those that never touched runtime code.
> >
> > regards,
> > Mick
> >
> > ¹)
> https://lists.apache.org/thread.html/a94946887081d8a408dd5cd01a203664f4d0197df713f0c63364a811%40%3Cdev.cassandra.apache.org%3E
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: but there are still the same only 3 compile errors left on the 1 line code in abstractRow.java

2019-07-17 Thread Jasonstack Zhao Yang
Hi,

It's probably just an Eclipse issue that doesn't properly handle lambda.

Intellij should work just fine or you can add a type cast "(Function)" for Eclipse...



On Wed, 17 Jul 2019 at 16:32, Nimbus Lin  wrote:

> To Sir Michael:
>
> Thank you for your guiding, my steps are:
> cd /data/workspace/OxyCentos/HadoopCas/cassandra-trunk
>git clone https://github.com/apache/cassandra.git
>cd cassandra-trunk/
>  git checkout -b cassandra-3.11.3
>ant artifacts
>
> The steps all build successfully, and the  returnings info from "ant
> artifacts" is:
> BUILD SUCCESSFUL
> Total time: 5 minutes 58 seconds
> [gloCalHelp.com@gloCalHelp1 cassandra-trunk]$ git status
> # On branch cassandra-3.11.3
> nothing to commit (working directory clean)
> [gloCalHelp.com@gloCalHelp1 cassandra-trunk]$ git log -n1
> commit 31d5d870f9f5b56391db46ba6cdf9e0882d8a5c0
> Merge: 6bcc60a d52c7b8
> Author: Benedict Elliott Smith 
> Date:   Mon Jul 16 17:46:37 2018 +0100
>
> Merge branch 'cassandra-3.0' into HEAD
>
> The "ant generate-eclipse-files" 's returnings are:
>  [copy] Warning: functions modified in the future.
>  [copy] Warning: tokenization modified in the future.
>
> generate-eclipse-files:
> [mkdir] Created dir:
> /data/workspace/OxyCentos/HadoopCas/cassandra-trunk/.settings
>
> BUILD SUCCESSFUL
> Total time: 12 seconds
>
> then I follow the page you offer's steps are:
> Start Eclipse.
> Select File->Import->Existing Projects into Workspace->Select git
> directory.
> Make sure “cassandra-trunk” is recognized and selected as a project
>
> and then from menu Project-> clean -> build only the selected projects..
>
> but there are still the same only 3 errors left on the 1 line code:
>
> 1, The method map(Function) in the type
> Stream is not applicable for the arguments (((transform != null) ?
> transform : ( cell) -> "")) AbstractRow.java
> /cassandra-trunk/src/java/org/apache/cassandra/db/rows  line 183
> Java Problem
> 2, Type mismatch: cannot convert from Function to Function super Cell,? extends R>   AbstractRow.java
> /cassandra-trunk/src/java/org/apache/cassandra/db/rows  line 183
> Java Problem
> 3, Type mismatch: cannot convert from String to R   AbstractRow.java
>   /cassandra-trunk/src/java/org/apache/cassandra/db/rows  line 183
>   Java Problem
>
> how to solve the cassandra 3.11.3 source's compiling errors exactly?
>
>
>
>
> Thank you!
>
> Sincerely
> Nimbuslin(Lin JiaXin)
> Mobile: 0086 180 5986 1565
> Mail: jiaxin...@live.com
>
>
> 
> From: Michael Shuler  on behalf of Michael Shuler
> 
> Sent: Tuesday, July 16, 2019 2:28 PM
> To: dev@cassandra.apache.org
> Subject: Re: Isn't there a workable cassandra java source for developing
> as other big data system?
>
> You have a dirty build environment. Your path of "/cassandra-trunk/src"
> in the error and the suggestion on slack that you are trying to build
> cassandra-3.11.3 shows me you need to start over fresh. Here you go:
>
> build docs:
> http://cassandra.apache.org/doc/latest/development/ide.html
>
> build steps pasted in slack:#cassandra:
>cd /tmp/
>git clone https://github.com/apache/cassandra.git
>cd cassandra/
>git checkout cassandra-3.11.3
>ant artifacts
>
> BUILD SUCCESSFUL
> Total time: 1 minute 32 seconds
>
> --
> Kind regards,
> Michael
>
> On 7/16/19 9:01 AM, Benedict Elliott Smith wrote:
> > 3.11.3 compiles just fine, I have just corroborated.  Sir Jeff is in
> > fact a Cassandra developer, so please feel free to engage with his
> > question, which was designed to help diagnose your problem.
> >
> >
> >> On 16 Jul 2019, at 14:54, Nimbus Lin  wrote:
> >>
> >> To Sir Jeff: your method of "ant realclean" doesn't work, but
> >> delete the needing library in build/jars/.
> >>
> >> To other Cassandra's developers: Hi, is there any Cassandra's
> >> developers here ?, would you like to tell me which version
> >> cassandra's java source is really able to be build and run well?
> >> or  how to solve the only  3 compile errors in  AbstractRow.java
> >> for version 3.11.3?   If cassandra's source is not really free for
> >> developing, then maybe it is better for me to change to other big
> >> data system's source  to develop.
> >>
> >> Isn't there a workable cassandra java source for developing  as
> >> other big data system?
> >>
> >>
> >>
> >> Thank you!
> >>
> >> Sincerely Nimbuslin(Lin JiaXin) Mobile: 0086 180 5986 1565 Mail:
> >> jiaxin...@live.com
> >>
> >> -
> >>
> >>
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >
> >
> > -
> >
> >
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> --

Re: Warn about SASI usage and allow to disable them

2019-01-14 Thread Jasonstack Zhao Yang
+1 on yaml config. +1 on disable by default.

On Tue, 15 Jan 2019 at 13:23 Taylor Cressy  wrote:

> +1 on config. +1 on disabling.
>
> +1 on applying it to materialized views as well.
>
> > On Jan 14, 2019, at 17:29, Joshua McKenzie  wrote:
> >
> > +1 on config change, +1 on disabling, and so long as the comments make
> the
> > limitations and risks extremely clear, I'm fine w/out the client warning.
> >
> > On Mon, Jan 14, 2019 at 12:28 PM Andrés de la Peña <
> a.penya.gar...@gmail.com>
> > wrote:
> >
> >> I mean disabling the creation of new SASI indices with CREATE INDEX
> >> statement, the existing indexes would continue working. The CQL client
> >> warning will be thrown with that creation statement as well (if they are
> >> enabled).
> >>
> >>> On Mon, 14 Jan 2019 at 20:18, Jeff Jirsa  wrote:
> >>>
> >>> When we say disable, do you mean disable creation of new SASI indices,
> or
> >>> disable using existing ones? I assume it's just creation of new?
> >>>
> >>> On Mon, Jan 14, 2019 at 11:19 AM Andrés de la Peña <
> >>> a.penya.gar...@gmail.com>
> >>> wrote:
> >>>
>  Hello all,
> 
>  It is my understanding that SASI is still to be considered an
>  experimental/beta feature, and they apparently are not being very
> >>> actively
>  developed. Some higlighted problems in SASI are:
> 
>  - OOMs during flush, as it is described in CASSANDRA-12662
>  - General secondary index consistency problems described in
> >>> CASSANDRA-8272.
>  There is a pending-review patch addressing the problem for regular 2i.
>  However, the proposed solution is based on indexing tombstones. SASI
>  doesn't index tombstones, so it wouldn't be enterely trivial to extend
> >>> the
>  approach to SASI.
>  - Probably insufficient testing. As far as I know, we don't have a
> >> single
>  dtest for SASI nor tests dealing with large SSTables.
> 
>  Similarly to what CASSANDRA-13959 did with materialized views,
>  CASSANDRA-14866 aims to throw a native protocol warning about SASI
>  experimental state, and to add a config property to disable them.
> >> Perhaps
>  this property could be disabled by default in trunk. This should raise
>  awareness about SASI maturity until we let them in a more stable
> state.
> 
>  The purpose for this thread is discussing whether we want to add this
>  warning, the config property and, more controversially, if we want to
> >> set
>  SASI as disabled by default in trunk.
> 
>  WDYT?
> 
> >>>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


CASSANDRA-14925 DecimalSerializer.toString() can OOM

2018-12-14 Thread Jasonstack Zhao Yang
Hi,

Would like to get some feedback for CASSANDRA-14925.

In order to avoid potential OOM attack, we propose to change
DecimalSerializer.toString() from `BigDecimal.toPlainString()` to
`BigDecimal.toString()` on Trunk.

This change should not cause any compatibility issues..

Thanks
Zhao Yang


[GitHub] cassandra-dtest pull request #1: CASSANDRA-13526: nodetool cleanup on KS wit...

2017-07-19 Thread jasonstack
Github user jasonstack closed the pull request at:

https://github.com/apache/cassandra-dtest/pull/1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



[GitHub] cassandra-dtest pull request #1: CASSANDRA-13526: nodetool cleanup on KS wit...

2017-07-19 Thread jasonstack
GitHub user jasonstack opened a pull request:

https://github.com/apache/cassandra-dtest/pull/1

CASSANDRA-13526: nodetool cleanup on KS with no replicas should remov…

JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13526   pending for 
2.2/3.0/3.11

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jasonstack/cassandra-dtest-1 CASSANDRA-13526

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra-dtest/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1


commit 3c8877c0fa3eb998ed2ee9945ebb8d43687e65fa
Author: Zhao Yang 
Date:   2017-07-20T03:18:18Z

CASSANDRA-13526: nodetool cleanup on KS with no replicas should remove old 
data, not silently complete




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



[GitHub] cassandra pull request: Setup travis ci

2016-04-30 Thread jasonstack
Github user jasonstack commented on the pull request:

https://github.com/apache/cassandra/pull/69#issuecomment-215976562
  
sorry.. wrong repo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Setup travis ci

2016-04-30 Thread jasonstack
Github user jasonstack closed the pull request at:

https://github.com/apache/cassandra/pull/69


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Setup travis ci

2016-04-30 Thread jasonstack
GitHub user jasonstack opened a pull request:

https://github.com/apache/cassandra/pull/69

Setup travis ci



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jasonstack/cassandra setup-travis-ci

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/69.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #69


commit b4b4a535dc7923767014654f4ac974623d59533f
Author: jasonstack 
Date:   2015-03-06T10:48:36Z

Update and rename README.asc to README.md

commit 5f493ca3293f9ba3134f337fa4e71a2bd3d35150
Author: zhao_yang 
Date:   2015-03-06T10:51:13Z

Hello

commit 79f5b48dcaa9bff624c623fc2eb160a361e036c8
Author: jasonstack 
Date:   2015-03-31T03:08:28Z

Merge pull request #1 from apache/trunk

merge from origin

commit 5169a977d24e2a79aa0879bf22bacffe203db2f3
Author: jasonstack 
Date:   2015-03-06T10:48:36Z

Update and rename README.asc to README.md

commit 6a0ef1f8f40fe4f6b7a7c1f1d8b43f73c81fbac4
Author: zhao_yang 
Date:   2015-03-06T10:51:13Z

Hello

commit 28fa8c6924dd83783d461d129b08f5415f13ab49
Author: jasonstack 
Date:   2016-04-10T13:39:51Z

Merge branch 'trunk' of https://github.com/jasonstack/cassandra into trunk

commit 0ba6750d2b26f6eb2701090c4dbb6374852fe37d
Author: jasonstack 
Date:   2016-04-30T16:19:49Z

add travis ci




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Added TenantAwared Compaction Strategy wit...

2016-03-27 Thread jasonstack
Github user jasonstack closed the pull request at:

https://github.com/apache/cassandra/pull/66


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Added TenantAwared Compaction Strategy wit...

2016-03-27 Thread jasonstack
GitHub user jasonstack opened a pull request:

https://github.com/apache/cassandra/pull/66

Added TenantAwared Compaction Strategy with CutomizedAntiCompaction a…

…nd SizeTiredCompaction

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jasonstack/cassandra CUSTOM_COMPACTION

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/66.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #66


commit 2656cdbee951b5f15bcddd77b53c89aeb73f512f
Author: Zhao Yang 
Date:   2016-03-26T14:43:03Z

Added TenantAwared Compaction Strategy with CutomizedAntiCompaction and 
SizeTiredCompaction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: CASSANDRA-9556: Add newer data types to ca...

2015-12-16 Thread jasonstack
Github user jasonstack commented on the pull request:

https://github.com/apache/cassandra/pull/58#issuecomment-165127521
  
Thank you. I know here is a mirror, but my company policy requires to make 
public PR.. sorry for the trouble.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: CASSANDRA-9556: Add newer data types to ca...

2015-12-16 Thread jasonstack
Github user jasonstack closed the pull request at:

https://github.com/apache/cassandra/pull/58


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: CASSANDRA-9556: Add newer data types to ca...

2015-12-10 Thread jasonstack
GitHub user jasonstack opened a pull request:

https://github.com/apache/cassandra/pull/58

CASSANDRA-9556: Add newer data types to cassandra stress

eg. TinyInt, Time, Date, SmallInt
Protocol upgrades to NEWEST_SUPPORTED

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WorksApplications/cassandra CASSANDRA-9556

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/58.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #58


commit 0de36aeb8ed0ed876eb55019464fda2f90a253dc
Author: Zhao Yang 
Date:   2015-12-09T06:42:19Z

CASSANDRA-9556: Add newer data types to cassandra stress (e.g. decimal, 
dates, UDTs)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---