Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-16 Thread Caleb Rackliffe
o trunk, either hidden behind a CI-only >> flag or exposed to the user via some experimental flag (and a suitable >> NEWS.txt). We’ve discussed the need to periodically merge feature branches >> with trunk before they are complete. If the work is logically complete for

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-16 Thread Mike Adamson
support to be merged into trunk >>>> >>>> FWIW, I’m OK with this merging to trunk, either hidden behind a CI-only >>>> flag or exposed to the user via some experimental flag (and a suitable >>>> NEWS.txt). We’ve discussed the need to periodically m

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-14 Thread Henrik Ingo
On Fri, Feb 11, 2022 at 8:47 PM Caleb Rackliffe wrote: > Just finished reading the latest version of the CEP. Here are my thoughts: > > - We've already talked about OR queries, so I won't rehash that, but > tokenization support seems like it might be another one of those places > where we can

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-14 Thread Caleb Rackliffe
;>> FWIW, I’m OK with this merging to trunk, either hidden behind a CI-only >>>> flag or exposed to the user via some experimental flag (and a suitable >>>> NEWS.txt). We’ve discussed the need to periodically merge feature branches >>>> with trunk before

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-14 Thread Mike Adamson
work to make OR consistent between SAI and non-SAI >> queries, I think that more than meets this criterion. >> >> >> From: Henrik Ingo > <mailto:henrik.i...@datastax.com>> >> Date: Monday, 7 February 2022 at 12:03 >> To: dev@cassandra.apache.org

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-11 Thread Caleb Rackliffe
Ingo > *Date: *Monday, 7 February 2022 at 12:03 > *To: *dev@cassandra.apache.org > *Subject: *Re: [DISCUSS] CEP-7 Storage Attached Index > Thanks Benjamin for reviewing and raising this. > > While I don't speak for the CEP authors, just some thoughts from me: > >

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-10 Thread Mike Adamson
gt; > > From: Henrik Ingo mailto:henrik.i...@datastax.com>> > Date: Monday, 7 February 2022 at 12:03 > To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> > mailto:dev@cassandra.apache.org>> > Subject: Re: [DISCUSS] CEP-7 Storage Attached Ind

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-09 Thread bened...@apache.org
day, 7 February 2022 at 12:03 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] CEP-7 Storage Attached Index Thanks Benjamin for reviewing and raising this. While I don't speak for the CEP authors, just some thoughts from me: On Mon, Feb 7, 2022 at 11:18 AM Benjamin Lerer mailto:ble...@apache.org&g

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-08 Thread Caleb Rackliffe
; I don’t have a strong opinion about CEP-7 taking a hard dependency on any > new CQL CEP, particularly from a point of view of first landing in the > codebase. > > > > > > *From: *Henrik Ingo > *Date: *Monday, 7 February 2022 at 12:03 > *To: *dev@cassandra.apache.org

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-07 Thread bened...@apache.org
I don’t have a strong opinion about CEP-7 taking a hard dependency on any new CQL CEP, particularly from a point of view of first landing in the codebase. From: Henrik Ingo Date: Monday, 7 February 2022 at 12:03 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] CEP-7 Storage Attached Index

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-07 Thread J. D. Jordan
Given this discussion +1 from me to move OR to its own CEP separate from the new index implementation. > On Feb 7, 2022, at 6:51 AM, Benjamin Lerer wrote: > >  >> This was since extended to also support ALLOW FILTERING mode as well as OR >> with clustering key columns. > > If the code is

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-07 Thread Benjamin Lerer
> > This was since extended to also support ALLOW FILTERING mode as well as OR > with clustering key columns. If the code is able to support query using clustering columns without the need for filtering + filtering queries then it should be relatively easy to have full support for CQL. We also

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-07 Thread Henrik Ingo
Thanks Benjamin for reviewing and raising this. While I don't speak for the CEP authors, just some thoughts from me: On Mon, Feb 7, 2022 at 11:18 AM Benjamin Lerer wrote: > I would like to raise 2 points regarding the current CEP proposal: > > 1. There are mention of some target versions and

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-03 Thread Mike Adamson
I can’t why there would be any objection to adding a guardrail. I think this is a good idea. MikeA "I see this as a task for a follow-up ticket so long as the CEP’s contributors would not oppose the addition of such a guardrail." > On 3 Feb 2022, at 16:06, C. Scott Andreas wrote: > > I see

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-02 Thread Jeremiah D Jordan
Given the distributed search part is an issue with our secondary indexes in general, and not with any implementation, I don’t see a reason to hold up a vote on CEP-7 for it? -Jeremiah > On Feb 2, 2022, at 10:01 AM, Henrik Ingo wrote: > > So this is an area I've thought about and in fact the

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-02 Thread Henrik Ingo
So this is an area I've thought about and in fact the overall dynamics are the same as for MongoDB secondary indexes in a sharded cluster. The TL:DR; is that the benefits far outweigh the limitations: * There's a large area of queries where you have the partition key but not the full Primary Key.

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-02 Thread Joshua McKenzie
To me the outstanding thing worth tackling is the Challenges section Caleb added in the CEP. Specifically: "The only "easy" way around these two challenges is to focus our efforts on queries that are restricted to either partitions or small token ranges. These queries behave well locally even on

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-02 Thread Mike Adamson
Hi, I’d like to restart this thread. We merged the row-aware branch to the SAI codebase just before Christmas and have subsequently updated the CEP to reflect these changes. I would like to move the discussion forward as to how we move this CEP towards a vote. MikeA > On 16 Sep 2021, at

Re: [DISCUSS] CEP-7 Storage Attached Index

2021-09-16 Thread DuyHai Doan
Good new Mike that row based indexing will be available, this was a major lacking from SASI at that time ! Le jeu. 16 sept. 2021 à 15:38, Mike Adamson a écrit : > Hi, > > Just to keep this thread up to date with development progress, we will be > adding row-aware support to SAI in the next few

Re: [DISCUSS] CEP-7 Storage Attached Index

2021-09-16 Thread Mike Adamson
Hi, Just to keep this thread up to date with development progress, we will be adding row-aware support to SAI in the next few weeks. This is currently going through the final stages of review and testing. This feature also adds on-disk versioning to SAI. This allows SAI to support multiple

Re: [DISCUSS] CEP-7 Storage Attached Index

2021-09-16 Thread Henrik Ingo
Thanks Caleb. Those observations are valid factual statements, and it's good to be clear where limitations are. I'd like to add that the usefulness of fan-out/broadcast secondary index queries depends on cluster size. I have noticed that everything in Cassandra tends to be designed for extremely

Re: [DISCUSS] CEP-7 Storage Attached Index

2021-09-15 Thread Caleb Rackliffe
Hey there, In the spirit of trying to get as many possible objections to a successful vote out of the way, I've added a "Challenges" section to the CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index#CEP7:StorageAttachedIndex-Challenges Most of you will be

Re: [DISCUSS] CEP-7 Storage Attached Index

2021-09-09 Thread Patrick McFadin
+1 on introducing this in an incremental manner and after reading through CASSANDRA-16092 that seems like a perfect place to start. I see that work on that Jira has stopped until direction for CEP-7 has been voted in. I say start the vote and let's get this really valuable developer feature

Re: [DISCUSS] CEP-7 Storage Attached Index

2021-09-07 Thread Caleb Rackliffe
So this thread stalled almost a year ago. (Wow, time flies when you're trying to release 4.0.) My synthesis of the conversation to this point is that while there are some open questions about testing methodology/"definition of done" and our choice of particular on-disk data structures, neither of

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-24 Thread Jasonstack Zhao Yang
>> Question is: is this planned as a next step? >> If yes, how are we going to mark SAI as experimental until it gets >> row offsets? Also, it is likely that index format is going to change when >> row offsets are added, so my concern is that we may have to support two >> versions of a format for

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-24 Thread Oleksandr Petrov
> But for improving overall index read performance, I think improving base table read perf (because SAI/SASI executes LOTS of SinglePartitionReadCommand after searching on-disk index) is more effective than switching from Trie to Prefix BTree. I haven't suggested switching to Prefix B-Tree or

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Jasonstack Zhao Yang
>> I think CEP should be more upfront with "eventually replace >> it" bit, since it raises the question about what the people who are using >> other index implementations can expect. Will update the CEP to emphasize: SAI will replace other indexes. >> Unfortunately, I do not have an >>

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Benedict Elliott Smith
FWIW, I personally look forward to receiving that contribution when the time is right. On 23/09/2020, 18:45, "Josh McKenzie" wrote: talking about that would involve some bits of information DataStax might not be ready to share? At the risk of derailing, I've been poking and

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Josh McKenzie
talking about that would involve some bits of information DataStax might not be ready to share? At the risk of derailing, I've been poking and prodding this week at we contributors at DS getting our act together w/a draft CEP for donating the trie-based indices to the ASF project. More to come;

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Caleb Rackliffe
As long as we can construct the on-disk indexes efficiently/directly from a Memtable-attached index on flush, there's room to try other data structures. Most of the innovation in SAI is around the layout of postings (something we can expand on if people are interested) and having a natively

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Oleksandr Petrov
I did see a bit about "future parity and beyond" which is more or less an obvious goal. I think CEP should be more upfront with "eventually replace it" bit, since it raises the question about what the people who are using other index implementations can expect. On Wed, Sep 23, 2020 at 6:00 PM

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Oleksandr Petrov
Short question: looking forward, how are we going to maintain three 2i implementations: SASI, SAI, and 2i? Another thing I think this CEP is missing is rationale and motivation about why trie-based indexes were chosen over, say, B-Tree. We did have a short discussion about this on Slack, but both

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-10 Thread Jasonstack Zhao Yang
Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7 SAI. The recorded video is available here: https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-01+Apache+Cassandra+Contributor+Meeting On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang wrote: > Thank you, Charles

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-01 Thread Jasonstack Zhao Yang
Thank you, Charles and Patrick On Tue, 1 Sep 2020 at 04:56, Charles Cao wrote: > Thank you, Patrick! > > On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin > wrote: > > > > I just moved it to 8AM for this meeting to better accommodate APAC. > Please > > see the update here: > > >

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-31 Thread Charles Cao
Thank you, Patrick! On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin wrote: > > I just moved it to 8AM for this meeting to better accommodate APAC. Please > see the update here: > https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting > > Patrick >

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-31 Thread Patrick McFadin
I just moved it to 8AM for this meeting to better accommodate APAC. Please see the update here: https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting Patrick On Mon, Aug 31, 2020 at 10:04 AM Charles Cao wrote: > Patrick, > > 11AM PST is a bad

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-31 Thread Charles Cao
Patrick, 11AM PST is a bad time for the people in the APAC timezone. Can we move it to 7 or 8AM PST in the morning to accommodate their needs ? ~Charles On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin wrote: > > Meeting scheduled. >

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-28 Thread Patrick McFadin
Meeting scheduled. https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting Tuesday September 1st, 11AM PST. I added a basic bullet for the agenda but if there is more, edit away. Patrick On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-28 Thread Jason Rutherglen
+1 On Thu, Aug 27, 2020 at 1:31 PM Jasonstack Zhao Yang wrote: > > +1 > > On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova > wrote: > > > +1 > > > > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe > > wrote: > > > > > +1 > > > > > > > > > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin > >

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-27 Thread Jasonstack Zhao Yang
+1 On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova wrote: > +1 > > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe > wrote: > > > +1 > > > > > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin > wrote: > > > > > > > > > This is related to the discussion Jordan and I had about the >

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-26 Thread Ekaterina Dimitrova
+1 On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe wrote: > +1 > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin wrote: > > > > > This is related to the discussion Jordan and I had about the contributor > > > Zoom call. Instead of open mic for any issue, call it based on a > discussion > > >

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-26 Thread Caleb Rackliffe
+1 On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin wrote: > This is related to the discussion Jordan and I had about the contributor > Zoom call. Instead of open mic for any issue, call it based on a discussion > thread or threads for higher bandwidth discussion. > > I would be happy to schedule

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-26 Thread Patrick McFadin
This is related to the discussion Jordan and I had about the contributor Zoom call. Instead of open mic for any issue, call it based on a discussion thread or threads for higher bandwidth discussion. I would be happy to schedule on for next week to specifically discuss CEP-7. I can attach the

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-25 Thread Joshua McKenzie
> > Does community plan to open another discussion or CEP on modularization? We probably should have a discussion on the ML or monthly contrib call about it first to see how aligned the interested contributors are. Could do that through CEP as well but CEP's (at least thus far sans k8s operator)

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-25 Thread Jasonstack Zhao Yang
>>> SASI's performance, specifically the search in the B+ tree component, >>> depends a lot on the component file's header being available in the >>> pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI bound >>> to this same or similar limitation? SAI also benefits from larger

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-24 Thread Mick Semb Wever
Adding to Duy's questions… * Hardware specs SASI's performance, specifically the search in the B+ tree component, depends a lot on the component file's header being available in the pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI bound to this same or similar limitation?

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-24 Thread Jasonstack Zhao Yang
> I think the project needs to conclude the discussions that keep being started around the "definition of done" before determining what sufficient quality assurance looks like for this feature. Looking forward to the Test/QA guideline. Thanks for bringing this up. > the CEP process suggest a

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-19 Thread Jasonstack Zhao Yang
Hi Duy, great questions. > 1) SASI was pretty inefficient indexing wide partitions because the index > structure only retains the partition token, not the clustering colums. As > per design doc SAI has row id mapping to partition offset, can we hope that > indexing wide partition will be more

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-18 Thread DuyHai Doan
Last but not least 4) Are collections, static columns, composite partition key composent and UDT indexings (at any depth) on the roadmap of SAI ? I strongly believe that those features are the bare minimum to make SAI an interesting replacement for the native 2nd index as well as SASI. SASI

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-18 Thread Benedict Elliott Smith
> SAI will follow the same QA/Testing guideline as in CASSANDRA-15536. CASSANDRA-15536 might set some good examples for retrospectively shoring up our quality assurance, but offers no prescriptions for how we approach the testing of new work. I think the project needs to conclude the

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-18 Thread DuyHai Doan
Thank you Zhao Yang for starting this topic After reading the short design doc, I have a few questions 1) SASI was pretty inefficient indexing wide partitions because the index structure only retains the partition token, not the clustering colums. As per design doc SAI has row id mapping to

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-18 Thread Jasonstack Zhao Yang
Mick thanks for your questions. > During the 4.0 beta phase this was intended to be addressed, i.e.> defining more specific QA guidelines for 4.0-rc. This would be an important > step towards QA guidelines for all changes and CEPs post-4.0. Agreed, I think CASSANDRA-15536

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-18 Thread Mick Semb Wever
> > We are looking forward to the community's feedback and suggestions. > What comes immediately to mind is testing requirements. It has been mentioned already that the project's testability and QA guidelines are inadequate to successfully introduce new features and refactorings to the codebase.

[DISCUSS] CEP-7 Storage Attached Index

2020-08-17 Thread Jasonstack Zhao Yang
Hi, As per the CEP guideline, I am sending this email to start a discussion about Storage-Attached-Index[1][2] for Apache Cassandra. A team at DataStax has developed a new index implementation, called Storage Attached Index(SAI), based on the advancement made by SASI. SAI improves: * disk usage