Re: [POLL] Vector type for ML

2023-05-04 Thread Caleb Rackliffe
Even in the ML case, sparse can just mean zeros rather than nulls, and they should compress similarly anyway. If we really want null values, I'd rather leave that in collections space. On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe wrote: > I actually still prefer *type[dimension]*, because I

Re: [POLL] Vector type for ML

2023-05-04 Thread Caleb Rackliffe
I actually still prefer *type[dimension]*, because I think I intuitively read this as a primitive (meaning no null elements) array. Then we can have the indexing apparatus only accept *frozen* for the HSNW case. If that isn't intuitive to anyone else, I don't really have a strong

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Francisco Guerrero
+1 (nb) On 2023/05/04 23:38:08 Yifan Cai wrote: > +1 > > From: Jon Haddad > Sent: Thursday, May 4, 2023 3:31:52 PM > To: dev@cassandra.apache.org > Subject: Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark > Bulk Analytics > > +1. > > Awesome

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Yifan Cai
+1 From: Jon Haddad Sent: Thursday, May 4, 2023 3:31:52 PM To: dev@cassandra.apache.org Subject: Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics +1. Awesome work Doug! Great to see this moving forward. On 2023/05/04 18:34:46

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Jon Haddad
+1. Awesome work Doug! Great to see this moving forward. On 2023/05/04 18:34:46 "C. Scott Andreas" wrote: > +1nb.As someone familiar with this work, it's pretty hard to overstate the > impact it has on completing Cassandra's HTAP story. Eliminating the overhead > of bulk reads and writes on

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Nate McCall
+1 Thanks Doug! On Fri, May 5, 2023 at 4:47 AM Doug Rohrer wrote: > Hello all, > > I’d like to put CEP-28 to a vote. > > Proposal: > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics > > Jira: >

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Brandon Williams
+1 Kind Regards, Brandon On Thu, May 4, 2023 at 11:47 AM Doug Rohrer wrote: > > Hello all, > > I’d like to put CEP-28 to a vote. > > Proposal: > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics > > Jira: >

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Jeremy Hanna
+1 nb, I had to run Cassandra + Hadoop from the early days (0.7+) and it was painful. This is a major step forward. > On May 4, 2023, at 1:44 PM, Patrick McFadin wrote: > > As somebody who gave this talk: https://youtu.be/9xf_IXNylhM I love the > evolution of this topic. Excited to see this!

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Patrick McFadin
As somebody who gave this talk: https://youtu.be/9xf_IXNylhM I love the evolution of this topic. Excited to see this! ++1 nb Patrick On Thu, May 4, 2023 at 11:35 AM C. Scott Andreas wrote: > +1nb. > > As someone familiar with this work, it's pretty hard to overstate the > impact it has on

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread C. Scott Andreas
+1nb.As someone familiar with this work, it's pretty hard to overstate the impact it has on completing Cassandra's HTAP story. Eliminating the overhead of bulk reads and writes on production OLTP clusters is transformative.– ScottOn May 4, 2023, at 9:47 AM, Doug Rohrer wrote:Hello all,I’d

Re: [POLL] Vector type for ML

2023-05-04 Thread Patrick McFadin
I agree with David's reasoning and the use of DENSE (and maybe eventually SPARSE). This is terminology well established in the data world, and it would lead to much easier adoption from users. VECTOR is close, but I can see having to create a lot of content around "How to use it and not get in

Re: [POLL] Vector type for ML

2023-05-04 Thread David Capwell
My views have changed over time on syntax and I feel type[dimention] may not be the best, so it has gone lower in my own personal ranking… this is my current preference 1) DENSE [dimention] | NON NULL [dimention] 2) VECTOR 3) type[dimention] My reasoning for this order * type[dimention] looks

[VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Doug Rohrer
Hello all, I’d like to put CEP-28 to a vote. Proposal: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics Jira: https://issues.apache.org/jira/browse/CASSANDRA-16222 Draft implementation: - Apache Cassandra Spark

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread guo Maxwell
Thanks Dinesh , That will be great. Dinesh Joshi 于2023年5月4日 周四下午11:06写道: > Hi Guo, > > I would expect that there would be release artifacts for the sidecar as > well as the library once this functionality is available. > > Dinesh > > On May 4, 2023, at 12:03 AM, guo Maxwell wrote: > > This is

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Dinesh Joshi
Hi Guo, I would expect that there would be release artifacts for the sidecar as well as the library once this functionality is available. Dinesh > On May 4, 2023, at 12:03 AM, guo Maxwell wrote: > > This is a very meaningful work, thanks , but I would like to ask a question > that is not

Re: [POLL] Vector type for ML

2023-05-04 Thread Brandon Williams
1. VECTOR 2. VECTOR FLOAT[n] 3. FLOAT[N] (Non null by default) Redundant or not, I think having the VECTOR keyword helps signify what the app is generally about and helps get buy-in from ML stakeholders. On Thu, May 4, 2023 at 3:45 AM Benedict wrote: > > Hurrah for initial agreement. > > For

Re: [POLL] Vector type for ML

2023-05-04 Thread Mike Adamson
That's fair comment. In this case I would be happy with any of your suggestions although I would prefer that the datatype did not support nulls. On Thu, 4 May 2023 at 11:55, Benedict wrote: > I would expect that the type of index would be specified anyway? > > I don’t think it’s good API design

Re: [POLL] Vector type for ML

2023-05-04 Thread Benedict
I would expect that the type of index would be specified anyway?I don’t think it’s good API design to have the field define the index you create - only to shape what is permitted.A HNSW index is very specific and should be asked for specifically, not implicitly, IMO.On 4 May 2023, at 11:47, Mike

Re: [POLL] Vector type for ML

2023-05-04 Thread Mike Adamson
> > For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N], > VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t > think VECTOR should be used to simply imply non-null, as this would be very > unintuitive. More logical would be NONNULL, if this is the only

Re: [POLL] Vector type for ML

2023-05-04 Thread Benedict
Hurrah for initial agreement. For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N], VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t think VECTOR should be used to simply imply non-null, as this would be very unintuitive. More logical would be NONNULL, if

Re: [VOTE] Release Apache Cassandra 3.11.15

2023-05-04 Thread Tommy Stendahl via dev
+1 (nb) -Original Message- From: "Miklosovic, Stefan" mailto:%22Miklosovic,%20stefan%22%20%3cstefan.mikloso...@netapp.com%3e>> Reply-To: dev@cassandra.apache.org To: dev@cassandra.apache.org

Re: [POLL] Vector type for ML

2023-05-04 Thread Mick Semb Wever
> > Did we agree on a CQL syntax? > > I don’t believe there has been a pool on CQL syntax… my understanding > reading all the threads is that there are ~4-5 options and non are -1ed, so > believe we are waiting for majority rule on this? > Re-reading that thread, IIUC the valid choices remaining

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread guo Maxwell
This is a very meaningful work, thanks , but I would like to ask a question that is not particularly related to the cep project's code design itself but the project engineering management : what is the future development and release plan of this project? As far as I know, project Cassandra Sidecar