Re: [DISCUSS] Vector type and empty value

2023-09-22 Thread Andrés de la Peña
I have just created CASSANDRA-18876 for this. I'll post a patch very soon.

On Wed, 20 Sept 2023 at 19:41, David Capwell  wrote:

> I don’t think we can readily migrate old types away from this however,
> without breaking backwards compatibility.
>
>
> Given that java driver has a different behavior from server, I wouldn’t be
> shocked to see that other drivers also have their own custom behaviors… so
> not clear how to migrate unless we actually hand a user facing standard per
> type… if all drivers use a “default value” and is consistent, I do think we
> could migrate, but would need to live with this till at least 6.0+
>
> We can only prevent its use in the CQL layer where support isn’t required.
>
>
> +1
>
> On Sep 20, 2023, at 7:38 AM, Benedict  wrote:
>
> Yes, if this is what was meant by empty I agree. It’s nonsensical for most
> types. Apologies for any confusion.
>
> I don’t think we can readily migrate old types away from this however,
> without breaking backwards compatibility. We can only prevent its use in
> the CQL layer where support isn’t required. My understanding was that we
> had at least tried to do this for all non-thrift schemas, but perhaps we
> did not do so thoroughly and now may have some CQL legacy support
> requirements as well.
>
> On 20 Sep 2023, at 15:30, Aleksey Yeshchenko  wrote:
>
> Allowing zero-length byte arrays for most old types is just a legacy from
> Darker Days. It’s a distinct concern from columns being nullable or not.
>
> There are a couple types where this makes sense: strings and blobs. All
> else should not allow this except for backward compatibility reasons. So,
> not for new types.
>
> On 20 Sep 2023, at 00:08, David Capwell  wrote:
>
> When does empty mean null?
>
>
>
> Most types are this way
>
> @Test
> public void nullExample()
> {
> createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new
> byte[0]));
> Row result = execute("SELECT * FROM %s WHERE pk=0").one();
> if (result.has("cuteness")) System.out.println("Cuteness score: " +
> result.getInt("cuteness"));
> else System.out.println("Cuteness score is undefined");
> }
>
>
> This test will NPE in getInt as the returned BB is seen as “null” for
> int32 type, you can make it “safer” by changing to the following
>
> if (result.has("cuteness")) System.out.println("Cuteness score: " +
> Int32Type.instance.compose(result.getBlob("cuteness")));
>
> Now we get the log "Cuteness score: null”
>
> What’s even better (just found this out) is that client isn’t consistent
> or correct in these cases!
>
> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT,
> "SELECT * FROM %s WHERE pk=0").one();
> if (result.getBytesUnsafe("cuteness") != null)
> System.out.println("Cuteness score: " + result.getInt("cuteness"));
> else System.out.println("Cuteness score is undefined”);
>
> This prints "Cuteness score: 0”
>
> So for Cassandra we think the value is “null” but java driver thinks it’s
> 0?
>
> Do we have types where writing an empty value creates a tombstone?
>
>
> Empty does not generate a tombstone for any type, but empty has a similar
> user experience as we return null in both cases (but just found out that
> the drivers may not be consistent with this…)
>
> On Sep 19, 2023, at 3:33 PM, J. D. Jordan 
> wrote:
>
>
> When does empty mean null?  My understanding was that empty is a valid
> value for the types that support it, separate from null (aka a tombstone).
> Do we have types where writing an empty value creates a tombstone?
>
> I agree with David that my preference would be for only blob and string
> like types to support empty. It’s too late for the existing types, but we
> should hold to this going forward. Which is what I think the idea was in
> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it
> was sad the existing numerics were emptiable, but too late to change, and
> we could correct it for newer types.
>
> On Sep 19, 2023, at 12:12 PM, David Capwell  wrote:
>
> 
>
>
> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making
> types non -emptiable. This approach makes more sense to me as having to
> deal with empty value is error prone in my opinion.
>
>
> I agree it’s confusing, and in the patch I found that different code paths
> didn’t handle things correctly as we have some times (most) that support
> empty bytes, and some that do not…. Empty also has different meaning in
> different code paths; for most it means “null”, and for some other types it
> means “empty”…. To try to make things more clear I added
> org.apache.cassandra.db.marshal.AbstractType#isNull(V,
> org.apache.cassandra.db.marshal.ValueAccessor) to the type system so
> each type can define if empty is null or not.
>
> I also think that it would be good to standardize on one approach to avoid
> confusion.
>
>
> I agree, but also don’t feel it’s a perfect one-size-fits-all thing….
> Let’s 

Re: [DISCUSS] Vector type and empty value

2023-09-20 Thread David Capwell
> I don’t think we can readily migrate old types away from this however, 
> without breaking backwards compatibility. 

Given that java driver has a different behavior from server, I wouldn’t be 
shocked to see that other drivers also have their own custom behaviors… so not 
clear how to migrate unless we actually hand a user facing standard per type… 
if all drivers use a “default value” and is consistent, I do think we could 
migrate, but would need to live with this till at least 6.0+

> We can only prevent its use in the CQL layer where support isn’t required.

+1

> On Sep 20, 2023, at 7:38 AM, Benedict  wrote:
> 
> Yes, if this is what was meant by empty I agree. It’s nonsensical for most 
> types. Apologies for any confusion.
> 
> I don’t think we can readily migrate old types away from this however, 
> without breaking backwards compatibility. We can only prevent its use in the 
> CQL layer where support isn’t required. My understanding was that we had at 
> least tried to do this for all non-thrift schemas, but perhaps we did not do 
> so thoroughly and now may have some CQL legacy support requirements as well.
> 
>> On 20 Sep 2023, at 15:30, Aleksey Yeshchenko  wrote:
>> 
>> Allowing zero-length byte arrays for most old types is just a legacy from 
>> Darker Days. It’s a distinct concern from columns being nullable or not.
>> 
>> There are a couple types where this makes sense: strings and blobs. All else 
>> should not allow this except for backward compatibility reasons. So, not for 
>> new types.
>> 
 On 20 Sep 2023, at 00:08, David Capwell  wrote:
 
 When does empty mean null?
>>> 
>>> 
>>> Most types are this way
>>> 
>>> @Test
>>> public void nullExample()
>>> {
>>> createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
>>> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new 
>>> byte[0]));
>>> Row result = execute("SELECT * FROM %s WHERE pk=0").one();
>>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>>> result.getInt("cuteness"));
>>> else System.out.println("Cuteness score is undefined");
>>> }
>>> 
>>> 
>>> This test will NPE in getInt as the returned BB is seen as “null” for int32 
>>> type, you can make it “safer” by changing to the following
>>> 
>>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>>> Int32Type.instance.compose(result.getBlob("cuteness")));
>>> 
>>> Now we get the log "Cuteness score: null”
>>> 
>>> What’s even better (just found this out) is that client isn’t consistent or 
>>> correct in these cases!
>>> 
>>> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, 
>>> "SELECT * FROM %s WHERE pk=0").one();
>>> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness 
>>> score: " + result.getInt("cuteness"));
>>> else System.out.println("Cuteness score is undefined”);
>>> 
>>> This prints "Cuteness score: 0”
>>> 
>>> So for Cassandra we think the value is “null” but java driver thinks it’s 0?
>>> 
 Do we have types where writing an empty value creates a tombstone?
>>> 
>>> Empty does not generate a tombstone for any type, but empty has a similar 
>>> user experience as we return null in both cases (but just found out that 
>>> the drivers may not be consistent with this…)
>>> 
> On Sep 19, 2023, at 3:33 PM, J. D. Jordan  
> wrote:
 
 When does empty mean null?  My understanding was that empty is a valid 
 value for the types that support it, separate from null (aka a tombstone). 
 Do we have types where writing an empty value creates a tombstone?
 
 I agree with David that my preference would be for only blob and string 
 like types to support empty. It’s too late for the existing types, but we 
 should hold to this going forward. Which is what I think the idea was in 
 https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was 
 sad the existing numerics were emptiable, but too late to change, and we 
 could correct it for newer types.
 
> On Sep 19, 2023, at 12:12 PM, David Capwell  wrote:
> 
> 
>> 
>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started 
>> making types non -emptiable. This approach makes more sense to me as 
>> having to deal with empty value is error prone in my opinion.
> 
> I agree it’s confusing, and in the patch I found that different code 
> paths didn’t handle things correctly as we have some times (most) that 
> support empty bytes, and some that do not…. Empty also has different 
> meaning in different code paths; for most it means “null”, and for some 
> other types it means “empty”…. To try to make things more clear I added 
> org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
> org.apache.cassandra.db.marshal.ValueAccessor) to the type system so 
> each type can define if empty is null or not.
> 
>> I also think that it would be good to 

Re: [DISCUSS] Vector type and empty value

2023-09-20 Thread David Capwell
> So, not for new types.

> Should we make the Vector type non-emptiable and stick to it for the new 
> types?

Yep, works for me.

We should also update the test 
org.apache.cassandra.db.marshal.AbstractTypeTest#empty to detect this for new 
types by making org.apache.cassandra.db.marshal.AbstractType#allowsEmpty 
default to false and override in all legacy types

More than glad to review any patch to fix this issue!

> On Sep 20, 2023, at 7:16 AM, Aleksey Yeshchenko  wrote:
> 
> Allowing zero-length byte arrays for most old types is just a legacy from 
> Darker Days. It’s a distinct concern from columns being nullable or not.
> 
> There are a couple types where this makes sense: strings and blobs. All else 
> should not allow this except for backward compatibility reasons. So, not for 
> new types.
> 
>> On 20 Sep 2023, at 00:08, David Capwell  wrote:
>> 
>>> When does empty mean null?
>> 
>> 
>> Most types are this way
>> 
>> @Test
>> public void nullExample()
>> {
>> createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
>> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new 
>> byte[0]));
>> Row result = execute("SELECT * FROM %s WHERE pk=0").one();
>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>> result.getInt("cuteness"));
>> else System.out.println("Cuteness score is undefined");
>> }
>> 
>> 
>> This test will NPE in getInt as the returned BB is seen as “null” for int32 
>> type, you can make it “safer” by changing to the following
>> 
>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>> Int32Type.instance.compose(result.getBlob("cuteness")));
>> 
>> Now we get the log "Cuteness score: null”
>> 
>> What’s even better (just found this out) is that client isn’t consistent or 
>> correct in these cases!
>> 
>> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, 
>> "SELECT * FROM %s WHERE pk=0").one();
>> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness 
>> score: " + result.getInt("cuteness"));
>> else System.out.println("Cuteness score is undefined”);
>> 
>> This prints "Cuteness score: 0”
>> 
>> So for Cassandra we think the value is “null” but java driver thinks it’s 0?
>> 
>>> Do we have types where writing an empty value creates a tombstone?
>> 
>> Empty does not generate a tombstone for any type, but empty has a similar 
>> user experience as we return null in both cases (but just found out that the 
>> drivers may not be consistent with this…)
>> 
>>> On Sep 19, 2023, at 3:33 PM, J. D. Jordan  wrote:
>>> 
>>> When does empty mean null?  My understanding was that empty is a valid 
>>> value for the types that support it, separate from null (aka a tombstone). 
>>> Do we have types where writing an empty value creates a tombstone?
>>> 
>>> I agree with David that my preference would be for only blob and string 
>>> like types to support empty. It’s too late for the existing types, but we 
>>> should hold to this going forward. Which is what I think the idea was in 
>>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was 
>>> sad the existing numerics were emptiable, but too late to change, and we 
>>> could correct it for newer types.
>>> 
 On Sep 19, 2023, at 12:12 PM, David Capwell  wrote:
 
 
> 
> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started 
> making types non -emptiable. This approach makes more sense to me as 
> having to deal with empty value is error prone in my opinion.
 
 I agree it’s confusing, and in the patch I found that different code paths 
 didn’t handle things correctly as we have some times (most) that support 
 empty bytes, and some that do not…. Empty also has different meaning in 
 different code paths; for most it means “null”, and for some other types 
 it means “empty”…. To try to make things more clear I added 
 org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
 org.apache.cassandra.db.marshal.ValueAccessor) to the type system so 
 each type can define if empty is null or not.
 
> I also think that it would be good to standardize on one approach to 
> avoid confusion.
 
 I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. 
 Let’s say I have a “blob” type and I write an empty byte… what does this 
 mean?  What does it mean for "text" type?  The fact I get back a null in 
 both those cases was very confusing to me… I do feel that some types 
 should support empty, and the common code of empty == null I think is very 
 brittle (blob/text was not correct in different places due to this...)… so 
 I am cool with removing that relationship, but don’t think we should have 
 a rule blocking empty for all current / future types as it some times does 
 make sense.
 
> empty vector (I presume) for the vector type?
 
 Empty vectors (vector[0]) are 

Re: [DISCUSS] Vector type and empty value

2023-09-20 Thread Aleksey Yeshchenko
Allowing zero-length byte arrays for most old types is just a legacy from 
Darker Days. It’s a distinct concern from columns being nullable or not.

There are a couple types where this makes sense: strings and blobs. All else 
should not allow this except for backward compatibility reasons. So, not for 
new types.

> On 20 Sep 2023, at 00:08, David Capwell  wrote:
> 
>> When does empty mean null?
> 
> 
> Most types are this way
> 
> @Test
> public void nullExample()
> {
> createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new 
> byte[0]));
> Row result = execute("SELECT * FROM %s WHERE pk=0").one();
> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
> result.getInt("cuteness"));
> else System.out.println("Cuteness score is undefined");
> }
> 
> 
> This test will NPE in getInt as the returned BB is seen as “null” for int32 
> type, you can make it “safer” by changing to the following
> 
> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
> Int32Type.instance.compose(result.getBlob("cuteness")));
> 
> Now we get the log "Cuteness score: null”
> 
> What’s even better (just found this out) is that client isn’t consistent or 
> correct in these cases!
> 
> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, 
> "SELECT * FROM %s WHERE pk=0").one();
> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness 
> score: " + result.getInt("cuteness"));
> else System.out.println("Cuteness score is undefined”);
> 
> This prints "Cuteness score: 0”
> 
> So for Cassandra we think the value is “null” but java driver thinks it’s 0?
> 
>> Do we have types where writing an empty value creates a tombstone?
> 
> Empty does not generate a tombstone for any type, but empty has a similar 
> user experience as we return null in both cases (but just found out that the 
> drivers may not be consistent with this…)
> 
>> On Sep 19, 2023, at 3:33 PM, J. D. Jordan  wrote:
>> 
>> When does empty mean null?  My understanding was that empty is a valid value 
>> for the types that support it, separate from null (aka a tombstone). Do we 
>> have types where writing an empty value creates a tombstone?
>> 
>> I agree with David that my preference would be for only blob and string like 
>> types to support empty. It’s too late for the existing types, but we should 
>> hold to this going forward. Which is what I think the idea was in 
>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was 
>> sad the existing numerics were emptiable, but too late to change, and we 
>> could correct it for newer types.
>> 
>>> On Sep 19, 2023, at 12:12 PM, David Capwell  wrote:
>>> 
>>> 
 
 When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making 
 types non -emptiable. This approach makes more sense to me as having to 
 deal with empty value is error prone in my opinion.
>>> 
>>> I agree it’s confusing, and in the patch I found that different code paths 
>>> didn’t handle things correctly as we have some times (most) that support 
>>> empty bytes, and some that do not…. Empty also has different meaning in 
>>> different code paths; for most it means “null”, and for some other types it 
>>> means “empty”…. To try to make things more clear I added 
>>> org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
>>> org.apache.cassandra.db.marshal.ValueAccessor) to the type system so 
>>> each type can define if empty is null or not.
>>> 
 I also think that it would be good to standardize on one approach to avoid 
 confusion.
>>> 
>>> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s 
>>> say I have a “blob” type and I write an empty byte… what does this mean?  
>>> What does it mean for "text" type?  The fact I get back a null in both 
>>> those cases was very confusing to me… I do feel that some types should 
>>> support empty, and the common code of empty == null I think is very brittle 
>>> (blob/text was not correct in different places due to this...)… so I am 
>>> cool with removing that relationship, but don’t think we should have a rule 
>>> blocking empty for all current / future types as it some times does make 
>>> sense.
>>> 
 empty vector (I presume) for the vector type?
>>> 
>>> Empty vectors (vector[0]) are blocked at the type level, the smallest 
>>> vector is vector[1]
>>> 
 as types that can never be null
>>> 
>>> One pro here is that “null” is cheaper (in some regards) than delete 
>>> (though we can never purge), but having 2 similar behaviors (write null, do 
>>> a delete) at the type level is a bit confusing… Right now I am allowed to 
>>> do the following (the below isn’t valid CQL, its a hybrid of CQL + Java 
>>> code…)
>>> 
>>> CREATE TABLE fluffykittens (pk int primary key, cuteness int);
>>> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])
>>> 
>>> CREATE TABLE 

Re: [DISCUSS] Vector type and empty value

2023-09-20 Thread Benedict
Yes, if this is what was meant by empty I agree. It’s nonsensical for most 
types. Apologies for any confusion.

I don’t think we can readily migrate old types away from this however, without 
breaking backwards compatibility. We can only prevent its use in the CQL layer 
where support isn’t required. My understanding was that we had at least tried 
to do this for all non-thrift schemas, but perhaps we did not do so thoroughly 
and now may have some CQL legacy support requirements as well.

> On 20 Sep 2023, at 15:30, Aleksey Yeshchenko  wrote:
> 
> Allowing zero-length byte arrays for most old types is just a legacy from 
> Darker Days. It’s a distinct concern from columns being nullable or not.
> 
> There are a couple types where this makes sense: strings and blobs. All else 
> should not allow this except for backward compatibility reasons. So, not for 
> new types.
> 
>>> On 20 Sep 2023, at 00:08, David Capwell  wrote:
>>> 
>>> When does empty mean null?
>> 
>> 
>> Most types are this way
>> 
>> @Test
>> public void nullExample()
>> {
>> createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
>> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new 
>> byte[0]));
>> Row result = execute("SELECT * FROM %s WHERE pk=0").one();
>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>> result.getInt("cuteness"));
>> else System.out.println("Cuteness score is undefined");
>> }
>> 
>> 
>> This test will NPE in getInt as the returned BB is seen as “null” for int32 
>> type, you can make it “safer” by changing to the following
>> 
>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>> Int32Type.instance.compose(result.getBlob("cuteness")));
>> 
>> Now we get the log "Cuteness score: null”
>> 
>> What’s even better (just found this out) is that client isn’t consistent or 
>> correct in these cases!
>> 
>> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, 
>> "SELECT * FROM %s WHERE pk=0").one();
>> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness 
>> score: " + result.getInt("cuteness"));
>> else System.out.println("Cuteness score is undefined”);
>> 
>> This prints "Cuteness score: 0”
>> 
>> So for Cassandra we think the value is “null” but java driver thinks it’s 0?
>> 
>>> Do we have types where writing an empty value creates a tombstone?
>> 
>> Empty does not generate a tombstone for any type, but empty has a similar 
>> user experience as we return null in both cases (but just found out that the 
>> drivers may not be consistent with this…)
>> 
 On Sep 19, 2023, at 3:33 PM, J. D. Jordan  
 wrote:
>>> 
>>> When does empty mean null?  My understanding was that empty is a valid 
>>> value for the types that support it, separate from null (aka a tombstone). 
>>> Do we have types where writing an empty value creates a tombstone?
>>> 
>>> I agree with David that my preference would be for only blob and string 
>>> like types to support empty. It’s too late for the existing types, but we 
>>> should hold to this going forward. Which is what I think the idea was in 
>>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was 
>>> sad the existing numerics were emptiable, but too late to change, and we 
>>> could correct it for newer types.
>>> 
 On Sep 19, 2023, at 12:12 PM, David Capwell  wrote:
 
 
> 
> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started 
> making types non -emptiable. This approach makes more sense to me as 
> having to deal with empty value is error prone in my opinion.
 
 I agree it’s confusing, and in the patch I found that different code paths 
 didn’t handle things correctly as we have some times (most) that support 
 empty bytes, and some that do not…. Empty also has different meaning in 
 different code paths; for most it means “null”, and for some other types 
 it means “empty”…. To try to make things more clear I added 
 org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
 org.apache.cassandra.db.marshal.ValueAccessor) to the type system so 
 each type can define if empty is null or not.
 
> I also think that it would be good to standardize on one approach to 
> avoid confusion.
 
 I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. 
 Let’s say I have a “blob” type and I write an empty byte… what does this 
 mean?  What does it mean for "text" type?  The fact I get back a null in 
 both those cases was very confusing to me… I do feel that some types 
 should support empty, and the common code of empty == null I think is very 
 brittle (blob/text was not correct in different places due to this...)… so 
 I am cool with removing that relationship, but don’t think we should have 
 a rule blocking empty for all current / future types as it some times does 
 make sense.
 
> empty vector 

Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread David Capwell
> When does empty mean null?


Most types are this way

@Test
public void nullExample()
{
createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new 
byte[0]));
Row result = execute("SELECT * FROM %s WHERE pk=0").one();
if (result.has("cuteness")) System.out.println("Cuteness score: " + 
result.getInt("cuteness"));
else System.out.println("Cuteness score is undefined");
}


This test will NPE in getInt as the returned BB is seen as “null” for int32 
type, you can make it “safer” by changing to the following

if (result.has("cuteness")) System.out.println("Cuteness score: " + 
Int32Type.instance.compose(result.getBlob("cuteness")));

Now we get the log "Cuteness score: null”

What’s even better (just found this out) is that client isn’t consistent or 
correct in these cases!

com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, 
"SELECT * FROM %s WHERE pk=0").one();
if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness 
score: " + result.getInt("cuteness"));
else System.out.println("Cuteness score is undefined”);

This prints "Cuteness score: 0”

So for Cassandra we think the value is “null” but java driver thinks it’s 0?

> Do we have types where writing an empty value creates a tombstone?

Empty does not generate a tombstone for any type, but empty has a similar user 
experience as we return null in both cases (but just found out that the drivers 
may not be consistent with this…)

> On Sep 19, 2023, at 3:33 PM, J. D. Jordan  wrote:
> 
> When does empty mean null?  My understanding was that empty is a valid value 
> for the types that support it, separate from null (aka a tombstone). Do we 
> have types where writing an empty value creates a tombstone?
> 
> I agree with David that my preference would be for only blob and string like 
> types to support empty. It’s too late for the existing types, but we should 
> hold to this going forward. Which is what I think the idea was in 
> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was 
> sad the existing numerics were emptiable, but too late to change, and we 
> could correct it for newer types.
> 
>> On Sep 19, 2023, at 12:12 PM, David Capwell  wrote:
>> 
>> 
>>> 
>>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making 
>>> types non -emptiable. This approach makes more sense to me as having to 
>>> deal with empty value is error prone in my opinion.
>> 
>> I agree it’s confusing, and in the patch I found that different code paths 
>> didn’t handle things correctly as we have some times (most) that support 
>> empty bytes, and some that do not…. Empty also has different meaning in 
>> different code paths; for most it means “null”, and for some other types it 
>> means “empty”…. To try to make things more clear I added 
>> org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
>> org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each 
>> type can define if empty is null or not.
>> 
>>> I also think that it would be good to standardize on one approach to avoid 
>>> confusion.
>> 
>> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s 
>> say I have a “blob” type and I write an empty byte… what does this mean?  
>> What does it mean for "text" type?  The fact I get back a null in both those 
>> cases was very confusing to me… I do feel that some types should support 
>> empty, and the common code of empty == null I think is very brittle 
>> (blob/text was not correct in different places due to this...)… so I am cool 
>> with removing that relationship, but don’t think we should have a rule 
>> blocking empty for all current / future types as it some times does make 
>> sense.
>> 
>>> empty vector (I presume) for the vector type?
>> 
>> Empty vectors (vector[0]) are blocked at the type level, the smallest vector 
>> is vector[1]
>> 
>>> as types that can never be null
>> 
>> One pro here is that “null” is cheaper (in some regards) than delete (though 
>> we can never purge), but having 2 similar behaviors (write null, do a 
>> delete) at the type level is a bit confusing… Right now I am allowed to do 
>> the following (the below isn’t valid CQL, its a hybrid of CQL + Java code…)
>> 
>> CREATE TABLE fluffykittens (pk int primary key, cuteness int);
>> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])
>> 
>> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY 
>> ((pk1, pk2));
>> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new 
>> byte[0], new byte[0]) — valid as the partition key is not empty as its a 
>> composite of 2 empty values, this is the same as new byte[2]
>> 
>> The first time I ever found out that empty bytes was valid was when a user 
>> was trying to abuse this in collections (also the fact collections support 
>> null in some cases and not others is fun…)…. It was blowing up in 

Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread J. D. Jordan
When does empty mean null?  My understanding was that empty is a valid value 
for the types that support it, separate from null (aka a tombstone). Do we have 
types where writing an empty value creates a tombstone?

I agree with David that my preference would be for only blob and string like 
types to support empty. It’s too late for the existing types, but we should 
hold to this going forward. Which is what I think the idea was in 
https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was sad 
the existing numerics were emptiable, but too late to change, and we could 
correct it for newer types.

> On Sep 19, 2023, at 12:12 PM, David Capwell  wrote:
> 
> 
>> 
>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making 
>> types non -emptiable. This approach makes more sense to me as having to deal 
>> with empty value is error prone in my opinion.
> 
> I agree it’s confusing, and in the patch I found that different code paths 
> didn’t handle things correctly as we have some times (most) that support 
> empty bytes, and some that do not…. Empty also has different meaning in 
> different code paths; for most it means “null”, and for some other types it 
> means “empty”…. To try to make things more clear I added 
> org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
> org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each 
> type can define if empty is null or not.
> 
>> I also think that it would be good to standardize on one approach to avoid 
>> confusion.
> 
> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s 
> say I have a “blob” type and I write an empty byte… what does this mean?  
> What does it mean for "text" type?  The fact I get back a null in both those 
> cases was very confusing to me… I do feel that some types should support 
> empty, and the common code of empty == null I think is very brittle 
> (blob/text was not correct in different places due to this...)… so I am cool 
> with removing that relationship, but don’t think we should have a rule 
> blocking empty for all current / future types as it some times does make 
> sense.
> 
>> empty vector (I presume) for the vector type?
> 
> Empty vectors (vector[0]) are blocked at the type level, the smallest vector 
> is vector[1]
> 
>> as types that can never be null
> 
> One pro here is that “null” is cheaper (in some regards) than delete (though 
> we can never purge), but having 2 similar behaviors (write null, do a delete) 
> at the type level is a bit confusing… Right now I am allowed to do the 
> following (the below isn’t valid CQL, its a hybrid of CQL + Java code…)
> 
> CREATE TABLE fluffykittens (pk int primary key, cuteness int);
> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])
> 
> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY ((pk1, 
> pk2));
> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new 
> byte[0], new byte[0]) — valid as the partition key is not empty as its a 
> composite of 2 empty values, this is the same as new byte[2]
> 
> The first time I ever found out that empty bytes was valid was when a user 
> was trying to abuse this in collections (also the fact collections support 
> null in some cases and not others is fun…)…. It was blowing up in random 
> places… good times!
> 
> I am personally not in favor of allowing empty bytes (other than for blob / 
> text as that is actually valid for the domain), but having similar types 
> having different semantics I feel is more problematic...
> 
>>> On Sep 19, 2023, at 8:56 AM, Josh McKenzie  wrote:
>>> 
>>> I am strongly in favour of permitting the table definition forbidding nulls 
>>> - and perhaps even defaulting to this behaviour. But I don’t think we 
>>> should have types that are inherently incapable of being null.
>> I'm with Benedict. Seems like this could help prevent whatever "nulls in 
>> primary key columns" problems Aleksey was alluding to on those tickets back 
>> in the day that pushed us towards making the new types non-emptiable as well 
>> (i.e. primary keys are non-null in table definition).
>> 
>> Furthering Alex' question, having a default value for unset fields in any 
>> non-collection context seems... quite surprising to me in a database. I 
>> could see the argument for making container / collection types non-nullable, 
>> maybe, but that just keeps us in a potential straddle case (some types 
>> nullable, some not).
>> 
>>> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
>>> 
>>> If I understand this suggestion correctly it is a whole can of worms, as 
>>> types that can never be null prevent us ever supporting outer joins that 
>>> return these types.
>>> 
>>> I am strongly in favour of permitting the table definition forbidding nulls 
>>> - and perhaps even defaulting to this behaviour. But I don’t think we 
>>> should have types that are inherently incapable of being null. I also 
>>> certainly 

Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread David Capwell
> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
> types non -emptiable. This approach makes more sense to me as having to deal 
> with empty value is error prone in my opinion.

I agree it’s confusing, and in the patch I found that different code paths 
didn’t handle things correctly as we have some times (most) that support empty 
bytes, and some that do not…. Empty also has different meaning in different 
code paths; for most it means “null”, and for some other types it means 
“empty”…. To try to make things more clear I added 
org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each 
type can define if empty is null or not.

> I also think that it would be good to standardize on one approach to avoid 
> confusion.

I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s say 
I have a “blob” type and I write an empty byte… what does this mean?  What does 
it mean for "text" type?  The fact I get back a null in both those cases was 
very confusing to me… I do feel that some types should support empty, and the 
common code of empty == null I think is very brittle (blob/text was not correct 
in different places due to this...)… so I am cool with removing that 
relationship, but don’t think we should have a rule blocking empty for all 
current / future types as it some times does make sense.

> empty vector (I presume) for the vector type?

Empty vectors (vector[0]) are blocked at the type level, the smallest vector is 
vector[1]

>  as types that can never be null

One pro here is that “null” is cheaper (in some regards) than delete (though we 
can never purge), but having 2 similar behaviors (write null, do a delete) at 
the type level is a bit confusing… Right now I am allowed to do the following 
(the below isn’t valid CQL, its a hybrid of CQL + Java code…)

CREATE TABLE fluffykittens (pk int primary key, cuteness int);
INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])

CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY ((pk1, 
pk2));
INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new byte[0], 
new byte[0]) — valid as the partition key is not empty as its a composite of 2 
empty values, this is the same as new byte[2]

The first time I ever found out that empty bytes was valid was when a user was 
trying to abuse this in collections (also the fact collections support null in 
some cases and not others is fun…)…. It was blowing up in random places… good 
times!

I am personally not in favor of allowing empty bytes (other than for blob / 
text as that is actually valid for the domain), but having similar types having 
different semantics I feel is more problematic...

> On Sep 19, 2023, at 8:56 AM, Josh McKenzie  wrote:
> 
>> I am strongly in favour of permitting the table definition forbidding nulls 
>> - and perhaps even defaulting to this behaviour. But I don’t think we should 
>> have types that are inherently incapable of being null.
> I'm with Benedict. Seems like this could help prevent whatever "nulls in 
> primary key columns" problems Aleksey was alluding to on those tickets back 
> in the day that pushed us towards making the new types non-emptiable as well 
> (i.e. primary keys are non-null in table definition).
> 
> Furthering Alex' question, having a default value for unset fields in any 
> non-collection context seems... quite surprising to me in a database. I could 
> see the argument for making container / collection types non-nullable, maybe, 
> but that just keeps us in a potential straddle case (some types nullable, 
> some not).
> 
> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
>> 
>> If I understand this suggestion correctly it is a whole can of worms, as 
>> types that can never be null prevent us ever supporting outer joins that 
>> return these types.
>> 
>> I am strongly in favour of permitting the table definition forbidding nulls 
>> - and perhaps even defaulting to this behaviour. But I don’t think we should 
>> have types that are inherently incapable of being null. I also certainly 
>> don’t think we should have bifurcated our behaviour between types like this.
>> 
>> 
>> 
>>> On 19 Sep 2023, at 11:54, Alex Petrov  wrote:
>>> 
>>> To make sure I understand this right; does that mean there will be a 
>>> default value for unset fields? Like 0 for numerical values, and an empty 
>>> vector (I presume) for the vector type?
>>> 
>>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
 Hi everybody,
 
 I noticed that the new Vector type accepts empty ByteBuffer values as an 
 input representing null.
 When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
 types non -emptiable. This approach makes more sense to me as having to 
 deal with empty value is error prone in my opinion.
 I also think that it would be good to standardize on one 

Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread Josh McKenzie
> I am strongly in favour of permitting the table definition forbidding nulls - 
> and perhaps even defaulting to this behaviour. But I don’t think we should 
> have types that are inherently incapable of being null.
I'm with Benedict. Seems like this could help prevent whatever "nulls in 
primary key columns" problems Aleksey was alluding to on those tickets back in 
the day that pushed us towards making the new types non-emptiable as well (i.e. 
primary keys are non-null in table definition).

Furthering Alex' question, having a default value for unset fields in any 
non-collection context seems... quite surprising to me in a database. I could 
see the argument for making container / collection types non-nullable, maybe, 
but that just keeps us in a potential straddle case (some types nullable, some 
not).

On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
> 
> If I understand this suggestion correctly it is a whole can of worms, as 
> types that can never be null prevent us ever supporting outer joins that 
> return these types.
> 
> I am strongly in favour of permitting the table definition forbidding nulls - 
> and perhaps even defaulting to this behaviour. But I don’t think we should 
> have types that are inherently incapable of being null. I also certainly 
> don’t think we should have bifurcated our behaviour between types like this.
> 
> 
> 
>> On 19 Sep 2023, at 11:54, Alex Petrov  wrote:
>> 
>> To make sure I understand this right; does that mean there will be a default 
>> value for unset fields? Like 0 for numerical values, and an empty vector (I 
>> presume) for the vector type?
>> 
>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
>>> Hi everybody,
>>> 
>>> I noticed that the new Vector type accepts empty ByteBuffer values as an 
>>> input representing null.
>>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
>>> types non -emptiable. This approach makes more sense to me as having to 
>>> deal with empty value is error prone in my opinion.
>>> I also think that it would be good to standardize on one approach to avoid 
>>> confusion.
>>> 
>>> Should we make the Vector type non-emptiable and stick to it for the new 
>>> types?
>>> 
>>> I like to hear your opinion.
>> 


Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread Benedict
If I understand this suggestion correctly it is a whole can of worms, as types 
that can never be null prevent us ever supporting outer joins that return these 
types.

I am strongly in favour of permitting the table definition forbidding nulls - 
and perhaps even defaulting to this behaviour. But I don’t think we should have 
types that are inherently incapable of being null. I also certainly don’t think 
we should have bifurcated our behaviour between types like this.


> On 19 Sep 2023, at 11:54, Alex Petrov  wrote:
> 
> 
> To make sure I understand this right; does that mean there will be a default 
> value for unset fields? Like 0 for numerical values, and an empty vector (I 
> presume) for the vector type?
> 
>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
>> Hi everybody,
>> 
>> I noticed that the new Vector type accepts empty ByteBuffer values as an 
>> input representing null.
>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
>> types non -emptiable. This approach makes more sense to me as having to deal 
>> with empty value is error prone in my opinion.
>> I also think that it would be good to standardize on one approach to avoid 
>> confusion.
>> 
>> Should we make the Vector type non-emptiable and stick to it for the new 
>> types?
>> 
>> I like to hear your opinion.
> 


Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread Alex Petrov
To make sure I understand this right; does that mean there will be a default 
value for unset fields? Like 0 for numerical values, and an empty vector (I 
presume) for the vector type?

On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
> Hi everybody,
> 
> I noticed that the new Vector type accepts empty ByteBuffer values as an 
> input representing null.
> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
> types non -emptiable. This approach makes more sense to me as having to deal 
> with empty value is error prone in my opinion.
> I also think that it would be good to standardize on one approach to avoid 
> confusion.
> 
> Should we make the Vector type non-emptiable and stick to it for the new 
> types?
> 
> I like to hear your opinion.


[DISCUSS] Vector type and empty value

2023-09-15 Thread Benjamin Lerer
Hi everybody,

I noticed that the new Vector type accepts empty ByteBuffer values as an
input representing null.
When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making
types non -emptiable. This approach makes more sense to me as having to
deal with empty value is error prone in my opinion.
I also think that it would be good to standardize on one approach to avoid
confusion.

Should we make the Vector type non-emptiable and stick to it for the new
types?

I like to hear your opinion.