o trunk, either hidden behind a CI-only
>> flag or exposed to the user via some experimental flag (and a suitable
>> NEWS.txt). We’ve discussed the need to periodically merge feature branches
>> with trunk before they are complete. If the work is logically complete for
support to be merged into trunk
>>>>
>>>> FWIW, I’m OK with this merging to trunk, either hidden behind a CI-only
>>>> flag or exposed to the user via some experimental flag (and a suitable
>>>> NEWS.txt). We’ve discussed the need to periodically m
On Fri, Feb 11, 2022 at 8:47 PM Caleb Rackliffe
wrote:
> Just finished reading the latest version of the CEP. Here are my thoughts:
>
> - We've already talked about OR queries, so I won't rehash that, but
> tokenization support seems like it might be another one of those places
> where we can
;>> FWIW, I’m OK with this merging to trunk, either hidden behind a CI-only
>>>> flag or exposed to the user via some experimental flag (and a suitable
>>>> NEWS.txt). We’ve discussed the need to periodically merge feature branches
>>>> with trunk before
work to make OR consistent between SAI and non-SAI
>> queries, I think that more than meets this criterion.
>>
>>
>> From: Henrik Ingo > <mailto:henrik.i...@datastax.com>>
>> Date: Monday, 7 February 2022 at 12:03
>> To: dev@cassandra.apache.org
Ingo
> *Date: *Monday, 7 February 2022 at 12:03
> *To: *dev@cassandra.apache.org
> *Subject: *Re: [DISCUSS] CEP-7 Storage Attached Index
> Thanks Benjamin for reviewing and raising this.
>
> While I don't speak for the CEP authors, just some thoughts from me:
>
>
gt;
>
> From: Henrik Ingo mailto:henrik.i...@datastax.com>>
> Date: Monday, 7 February 2022 at 12:03
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>
> mailto:dev@cassandra.apache.org>>
> Subject: Re: [DISCUSS] CEP-7 Storage Attached Ind
day, 7 February 2022 at 12:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-7 Storage Attached Index
Thanks Benjamin for reviewing and raising this.
While I don't speak for the CEP authors, just some thoughts from me:
On Mon, Feb 7, 2022 at 11:18 AM Benjamin Lerer
mailto:ble...@apache.org&g
; I don’t have a strong opinion about CEP-7 taking a hard dependency on any
> new CQL CEP, particularly from a point of view of first landing in the
> codebase.
>
>
>
>
>
> *From: *Henrik Ingo
> *Date: *Monday, 7 February 2022 at 12:03
> *To: *dev@cassandra.apache.org
I don’t have a strong opinion about CEP-7 taking a hard dependency on any new
CQL CEP, particularly from a point of view of first landing in the codebase.
From: Henrik Ingo
Date: Monday, 7 February 2022 at 12:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-7 Storage Attached Index
Given this discussion +1 from me to move OR to its own CEP separate from the
new index implementation.
> On Feb 7, 2022, at 6:51 AM, Benjamin Lerer wrote:
>
>
>> This was since extended to also support ALLOW FILTERING mode as well as OR
>> with clustering key columns.
>
> If the code is
>
> This was since extended to also support ALLOW FILTERING mode as well as OR
> with clustering key columns.
If the code is able to support query using clustering columns without the
need for filtering + filtering queries then it should be relatively easy to
have full support for CQL.
We also
Thanks Benjamin for reviewing and raising this.
While I don't speak for the CEP authors, just some thoughts from me:
On Mon, Feb 7, 2022 at 11:18 AM Benjamin Lerer wrote:
> I would like to raise 2 points regarding the current CEP proposal:
>
> 1. There are mention of some target versions and
I can’t why there would be any objection to adding a guardrail. I think this is
a good idea.
MikeA
"I see this as a task for a follow-up ticket so long as the CEP’s contributors
would not oppose the addition of such a guardrail."
> On 3 Feb 2022, at 16:06, C. Scott Andreas wrote:
>
> I see
Given the distributed search part is an issue with our secondary indexes in
general, and not with any implementation, I don’t see a reason to hold up a
vote on CEP-7 for it?
-Jeremiah
> On Feb 2, 2022, at 10:01 AM, Henrik Ingo wrote:
>
> So this is an area I've thought about and in fact the
So this is an area I've thought about and in fact the overall dynamics are
the same as for MongoDB secondary indexes in a sharded cluster. The TL:DR;
is that the benefits far outweigh the limitations:
* There's a large area of queries where you have the partition key but not
the full Primary Key.
To me the outstanding thing worth tackling is the Challenges section Caleb
added in the CEP. Specifically:
"The only "easy" way around these two challenges is to focus our efforts on
queries that are restricted to either partitions or small token ranges.
These queries behave well locally even on
Hi,
I’d like to restart this thread.
We merged the row-aware branch to the SAI codebase just before Christmas and
have subsequently updated the CEP to reflect these changes.
I would like to move the discussion forward as to how we move this CEP towards
a vote.
MikeA
> On 16 Sep 2021, at
Good new Mike that row based indexing will be available, this was a major
lacking from SASI at that time !
Le jeu. 16 sept. 2021 à 15:38, Mike Adamson a
écrit :
> Hi,
>
> Just to keep this thread up to date with development progress, we will be
> adding row-aware support to SAI in the next few
Hi,
Just to keep this thread up to date with development progress, we will be
adding row-aware support to SAI in the next few weeks. This is currently going
through the final stages of review and testing.
This feature also adds on-disk versioning to SAI. This allows SAI to support
multiple
Thanks Caleb.
Those observations are valid factual statements, and it's good to be clear
where limitations are. I'd like to add that the usefulness of
fan-out/broadcast secondary index queries depends on cluster size. I have
noticed that everything in Cassandra tends to be designed for extremely
Hey there,
In the spirit of trying to get as many possible objections to a successful
vote out of the way, I've added a "Challenges" section to the CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index#CEP7:StorageAttachedIndex-Challenges
Most of you will be
+1 on introducing this in an incremental manner and after reading through
CASSANDRA-16092 that seems like a perfect place to start. I see that work
on that Jira has stopped until direction for CEP-7 has been voted in.
I say start the vote and let's get this really valuable developer feature
So this thread stalled almost a year ago. (Wow, time flies when you're
trying to release 4.0.) My synthesis of the conversation to this point is
that while there are some open questions about testing
methodology/"definition of done" and our choice of particular on-disk data
structures, neither of
>> Question is: is this planned as a next step?
>> If yes, how are we going to mark SAI as experimental until it gets
>> row offsets? Also, it is likely that index format is going to change when
>> row offsets are added, so my concern is that we may have to support two
>> versions of a format for
> But for improving overall index read performance, I think improving base
table read perf (because SAI/SASI executes LOTS of
SinglePartitionReadCommand after searching on-disk index) is more effective
than switching from Trie to Prefix BTree.
I haven't suggested switching to Prefix B-Tree or
>> I think CEP should be more upfront with "eventually replace
>> it" bit, since it raises the question about what the people who are
using
>> other index implementations can expect.
Will update the CEP to emphasize: SAI will replace other indexes.
>> Unfortunately, I do not have an
>>
FWIW, I personally look forward to receiving that contribution when the time is
right.
On 23/09/2020, 18:45, "Josh McKenzie" wrote:
talking about that would involve some bits of information DataStax might
not be ready to share?
At the risk of derailing, I've been poking and
talking about that would involve some bits of information DataStax might
not be ready to share?
At the risk of derailing, I've been poking and prodding this week at we
contributors at DS getting our act together w/a draft CEP for donating the
trie-based indices to the ASF project.
More to come;
As long as we can construct the on-disk indexes efficiently/directly from a
Memtable-attached index on flush, there's room to try other data
structures. Most of the innovation in SAI is around the layout of postings
(something we can expand on if people are interested) and having a natively
I did see a bit about "future parity and beyond" which is more or less an
obvious goal. I think CEP should be more upfront with "eventually replace
it" bit, since it raises the question about what the people who are using
other index implementations can expect.
On Wed, Sep 23, 2020 at 6:00 PM
Short question: looking forward, how are we going to maintain three 2i
implementations: SASI, SAI, and 2i?
Another thing I think this CEP is missing is rationale and motivation
about why trie-based indexes were chosen over, say, B-Tree. We did have a
short discussion about this on Slack, but both
Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7 SAI.
The recorded video is available here:
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-01+Apache+Cassandra+Contributor+Meeting
On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang
wrote:
> Thank you, Charles
Thank you, Charles and Patrick
On Tue, 1 Sep 2020 at 04:56, Charles Cao wrote:
> Thank you, Patrick!
>
> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin
> wrote:
> >
> > I just moved it to 8AM for this meeting to better accommodate APAC.
> Please
> > see the update here:
> >
>
Thank you, Patrick!
On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin wrote:
>
> I just moved it to 8AM for this meeting to better accommodate APAC. Please
> see the update here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
>
> Patrick
>
I just moved it to 8AM for this meeting to better accommodate APAC. Please
see the update here:
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
Patrick
On Mon, Aug 31, 2020 at 10:04 AM Charles Cao wrote:
> Patrick,
>
> 11AM PST is a bad
Patrick,
11AM PST is a bad time for the people in the APAC timezone. Can we
move it to 7 or 8AM PST in the morning to accommodate their needs ?
~Charles
On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin wrote:
>
> Meeting scheduled.
>
Meeting scheduled.
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
Tuesday September 1st, 11AM PST. I added a basic bullet for the agenda but
if there is more, edit away.
Patrick
On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
+1
On Thu, Aug 27, 2020 at 1:31 PM Jasonstack Zhao Yang
wrote:
>
> +1
>
> On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova
> wrote:
>
> > +1
> >
> > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe
> > wrote:
> >
> > > +1
> > >
> > >
> > >
> > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin
> >
+1
On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova
wrote:
> +1
>
> On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe
> wrote:
>
> > +1
> >
> >
> >
> > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin
> wrote:
> >
> >
> >
> > > This is related to the discussion Jordan and I had about the
>
+1
On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe
wrote:
> +1
>
>
>
> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin wrote:
>
>
>
> > This is related to the discussion Jordan and I had about the contributor
>
> > Zoom call. Instead of open mic for any issue, call it based on a
> discussion
>
> >
+1
On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin wrote:
> This is related to the discussion Jordan and I had about the contributor
> Zoom call. Instead of open mic for any issue, call it based on a discussion
> thread or threads for higher bandwidth discussion.
>
> I would be happy to schedule
This is related to the discussion Jordan and I had about the contributor
Zoom call. Instead of open mic for any issue, call it based on a discussion
thread or threads for higher bandwidth discussion.
I would be happy to schedule on for next week to specifically discuss
CEP-7. I can attach the
>
> Does community plan to open another discussion or CEP on modularization?
We probably should have a discussion on the ML or monthly contrib call
about it first to see how aligned the interested contributors are. Could do
that through CEP as well but CEP's (at least thus far sans k8s operator)
>>> SASI's performance, specifically the search in the B+ tree component,
>>> depends a lot on the component file's header being available in the
>>> pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI
bound
>>> to this same or similar limitation?
SAI also benefits from larger
Adding to Duy's questions…
* Hardware specs
SASI's performance, specifically the search in the B+ tree component,
depends a lot on the component file's header being available in the
pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI bound
to this same or similar limitation?
> I think the project needs to conclude the discussions that keep being
started around the "definition of done" before determining what sufficient
quality assurance looks like for this feature.
Looking forward to the Test/QA guideline. Thanks for bringing this up.
> the CEP process suggest a
Hi Duy, great questions.
> 1) SASI was pretty inefficient indexing wide partitions because the index
> structure only retains the partition token, not the clustering colums. As
> per design doc SAI has row id mapping to partition offset, can we hope
that
> indexing wide partition will be more
Last but not least
4) Are collections, static columns, composite partition key composent and
UDT indexings (at any depth) on the roadmap of SAI ? I strongly believe
that those features are the bare minimum to make SAI an interesting
replacement for the native 2nd index as well as SASI. SASI
> SAI will follow the same QA/Testing guideline as in CASSANDRA-15536.
CASSANDRA-15536 might set some good examples for retrospectively shoring up our
quality assurance, but offers no prescriptions for how we approach the testing
of new work. I think the project needs to conclude the
Thank you Zhao Yang for starting this topic
After reading the short design doc, I have a few questions
1) SASI was pretty inefficient indexing wide partitions because the index
structure only retains the partition token, not the clustering colums. As
per design doc SAI has row id mapping to
Mick thanks for your questions.
> During the 4.0 beta phase this was intended to be addressed, i.e.>
defining more specific QA guidelines for 4.0-rc. This would be an important
> step towards QA guidelines for all changes and CEPs post-4.0.
Agreed, I think CASSANDRA-15536
>
> We are looking forward to the community's feedback and suggestions.
>
What comes immediately to mind is testing requirements. It has been
mentioned already that the project's testability and QA guidelines are
inadequate to successfully introduce new features and refactorings to the
codebase.
Hi,
As per the CEP guideline, I am sending this email to start a discussion
about Storage-Attached-Index[1][2] for Apache Cassandra.
A team at DataStax has developed a new index implementation, called Storage
Attached Index(SAI), based on the advancement made by SASI. SAI improves:
* disk usage
54 matches
Mail list logo