Re: [DISCUSS] Vector type and empty value
I have just created CASSANDRA-18876 for this. I'll post a patch very soon. On Wed, 20 Sept 2023 at 19:41, David Capwell wrote: > I don’t think we can readily migrate old types away from this however, > without breaking backwards compatibility. > > > Given that java driver has a different behavior from server, I wouldn’t be > shocked to see that other drivers also have their own custom behaviors… so > not clear how to migrate unless we actually hand a user facing standard per > type… if all drivers use a “default value” and is consistent, I do think we > could migrate, but would need to live with this till at least 6.0+ > > We can only prevent its use in the CQL layer where support isn’t required. > > > +1 > > On Sep 20, 2023, at 7:38 AM, Benedict wrote: > > Yes, if this is what was meant by empty I agree. It’s nonsensical for most > types. Apologies for any confusion. > > I don’t think we can readily migrate old types away from this however, > without breaking backwards compatibility. We can only prevent its use in > the CQL layer where support isn’t required. My understanding was that we > had at least tried to do this for all non-thrift schemas, but perhaps we > did not do so thoroughly and now may have some CQL legacy support > requirements as well. > > On 20 Sep 2023, at 15:30, Aleksey Yeshchenko wrote: > > Allowing zero-length byte arrays for most old types is just a legacy from > Darker Days. It’s a distinct concern from columns being nullable or not. > > There are a couple types where this makes sense: strings and blobs. All > else should not allow this except for backward compatibility reasons. So, > not for new types. > > On 20 Sep 2023, at 00:08, David Capwell wrote: > > When does empty mean null? > > > > Most types are this way > > @Test > public void nullExample() > { > createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); > execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new > byte[0])); > Row result = execute("SELECT * FROM %s WHERE pk=0").one(); > if (result.has("cuteness")) System.out.println("Cuteness score: " + > result.getInt("cuteness")); > else System.out.println("Cuteness score is undefined"); > } > > > This test will NPE in getInt as the returned BB is seen as “null” for > int32 type, you can make it “safer” by changing to the following > > if (result.has("cuteness")) System.out.println("Cuteness score: " + > Int32Type.instance.compose(result.getBlob("cuteness"))); > > Now we get the log "Cuteness score: null” > > What’s even better (just found this out) is that client isn’t consistent > or correct in these cases! > > com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, > "SELECT * FROM %s WHERE pk=0").one(); > if (result.getBytesUnsafe("cuteness") != null) > System.out.println("Cuteness score: " + result.getInt("cuteness")); > else System.out.println("Cuteness score is undefined”); > > This prints "Cuteness score: 0” > > So for Cassandra we think the value is “null” but java driver thinks it’s > 0? > > Do we have types where writing an empty value creates a tombstone? > > > Empty does not generate a tombstone for any type, but empty has a similar > user experience as we return null in both cases (but just found out that > the drivers may not be consistent with this…) > > On Sep 19, 2023, at 3:33 PM, J. D. Jordan > wrote: > > > When does empty mean null? My understanding was that empty is a valid > value for the types that support it, separate from null (aka a tombstone). > Do we have types where writing an empty value creates a tombstone? > > I agree with David that my preference would be for only blob and string > like types to support empty. It’s too late for the existing types, but we > should hold to this going forward. Which is what I think the idea was in > https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it > was sad the existing numerics were emptiable, but too late to change, and > we could correct it for newer types. > > On Sep 19, 2023, at 12:12 PM, David Capwell wrote: > > > > > When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making > types non -emptiable. This approach makes more sense to me as having to > deal with empty value is error prone in my opinion. > > > I agree it’s confusing, and in the patch I found that different code paths > didn’t handle things correctly as we have some times (most) that support > empty bytes, and some that do not…. Empty also has different meaning in > different code paths; for most it means “null”, and for some other types it > means “empty”…. To try to make things more clear I added > org.apache.cassandra.db.marshal.AbstractType#isNull(V, > org.apache.cassandra.db.marshal.ValueAccessor) to the type system so > each type can define if empty is null or not. > > I also think that it would be good to standardize on one approach to avoid > confusion. > > > I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. > Let’s
Re: [DISCUSS] Vector type and empty value
> I don’t think we can readily migrate old types away from this however, > without breaking backwards compatibility. Given that java driver has a different behavior from server, I wouldn’t be shocked to see that other drivers also have their own custom behaviors… so not clear how to migrate unless we actually hand a user facing standard per type… if all drivers use a “default value” and is consistent, I do think we could migrate, but would need to live with this till at least 6.0+ > We can only prevent its use in the CQL layer where support isn’t required. +1 > On Sep 20, 2023, at 7:38 AM, Benedict wrote: > > Yes, if this is what was meant by empty I agree. It’s nonsensical for most > types. Apologies for any confusion. > > I don’t think we can readily migrate old types away from this however, > without breaking backwards compatibility. We can only prevent its use in the > CQL layer where support isn’t required. My understanding was that we had at > least tried to do this for all non-thrift schemas, but perhaps we did not do > so thoroughly and now may have some CQL legacy support requirements as well. > >> On 20 Sep 2023, at 15:30, Aleksey Yeshchenko wrote: >> >> Allowing zero-length byte arrays for most old types is just a legacy from >> Darker Days. It’s a distinct concern from columns being nullable or not. >> >> There are a couple types where this makes sense: strings and blobs. All else >> should not allow this except for backward compatibility reasons. So, not for >> new types. >> On 20 Sep 2023, at 00:08, David Capwell wrote: When does empty mean null? >>> >>> >>> Most types are this way >>> >>> @Test >>> public void nullExample() >>> { >>> createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); >>> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new >>> byte[0])); >>> Row result = execute("SELECT * FROM %s WHERE pk=0").one(); >>> if (result.has("cuteness")) System.out.println("Cuteness score: " + >>> result.getInt("cuteness")); >>> else System.out.println("Cuteness score is undefined"); >>> } >>> >>> >>> This test will NPE in getInt as the returned BB is seen as “null” for int32 >>> type, you can make it “safer” by changing to the following >>> >>> if (result.has("cuteness")) System.out.println("Cuteness score: " + >>> Int32Type.instance.compose(result.getBlob("cuteness"))); >>> >>> Now we get the log "Cuteness score: null” >>> >>> What’s even better (just found this out) is that client isn’t consistent or >>> correct in these cases! >>> >>> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, >>> "SELECT * FROM %s WHERE pk=0").one(); >>> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness >>> score: " + result.getInt("cuteness")); >>> else System.out.println("Cuteness score is undefined”); >>> >>> This prints "Cuteness score: 0” >>> >>> So for Cassandra we think the value is “null” but java driver thinks it’s 0? >>> Do we have types where writing an empty value creates a tombstone? >>> >>> Empty does not generate a tombstone for any type, but empty has a similar >>> user experience as we return null in both cases (but just found out that >>> the drivers may not be consistent with this…) >>> > On Sep 19, 2023, at 3:33 PM, J. D. Jordan > wrote: When does empty mean null? My understanding was that empty is a valid value for the types that support it, separate from null (aka a tombstone). Do we have types where writing an empty value creates a tombstone? I agree with David that my preference would be for only blob and string like types to support empty. It’s too late for the existing types, but we should hold to this going forward. Which is what I think the idea was in https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was sad the existing numerics were emptiable, but too late to change, and we could correct it for newer types. > On Sep 19, 2023, at 12:12 PM, David Capwell wrote: > > >> >> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started >> making types non -emptiable. This approach makes more sense to me as >> having to deal with empty value is error prone in my opinion. > > I agree it’s confusing, and in the patch I found that different code > paths didn’t handle things correctly as we have some times (most) that > support empty bytes, and some that do not…. Empty also has different > meaning in different code paths; for most it means “null”, and for some > other types it means “empty”…. To try to make things more clear I added > org.apache.cassandra.db.marshal.AbstractType#isNull(V, > org.apache.cassandra.db.marshal.ValueAccessor) to the type system so > each type can define if empty is null or not. > >> I also think that it would be good to
Re: [DISCUSS] Vector type and empty value
> So, not for new types. > Should we make the Vector type non-emptiable and stick to it for the new > types? Yep, works for me. We should also update the test org.apache.cassandra.db.marshal.AbstractTypeTest#empty to detect this for new types by making org.apache.cassandra.db.marshal.AbstractType#allowsEmpty default to false and override in all legacy types More than glad to review any patch to fix this issue! > On Sep 20, 2023, at 7:16 AM, Aleksey Yeshchenko wrote: > > Allowing zero-length byte arrays for most old types is just a legacy from > Darker Days. It’s a distinct concern from columns being nullable or not. > > There are a couple types where this makes sense: strings and blobs. All else > should not allow this except for backward compatibility reasons. So, not for > new types. > >> On 20 Sep 2023, at 00:08, David Capwell wrote: >> >>> When does empty mean null? >> >> >> Most types are this way >> >> @Test >> public void nullExample() >> { >> createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); >> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new >> byte[0])); >> Row result = execute("SELECT * FROM %s WHERE pk=0").one(); >> if (result.has("cuteness")) System.out.println("Cuteness score: " + >> result.getInt("cuteness")); >> else System.out.println("Cuteness score is undefined"); >> } >> >> >> This test will NPE in getInt as the returned BB is seen as “null” for int32 >> type, you can make it “safer” by changing to the following >> >> if (result.has("cuteness")) System.out.println("Cuteness score: " + >> Int32Type.instance.compose(result.getBlob("cuteness"))); >> >> Now we get the log "Cuteness score: null” >> >> What’s even better (just found this out) is that client isn’t consistent or >> correct in these cases! >> >> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, >> "SELECT * FROM %s WHERE pk=0").one(); >> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness >> score: " + result.getInt("cuteness")); >> else System.out.println("Cuteness score is undefined”); >> >> This prints "Cuteness score: 0” >> >> So for Cassandra we think the value is “null” but java driver thinks it’s 0? >> >>> Do we have types where writing an empty value creates a tombstone? >> >> Empty does not generate a tombstone for any type, but empty has a similar >> user experience as we return null in both cases (but just found out that the >> drivers may not be consistent with this…) >> >>> On Sep 19, 2023, at 3:33 PM, J. D. Jordan wrote: >>> >>> When does empty mean null? My understanding was that empty is a valid >>> value for the types that support it, separate from null (aka a tombstone). >>> Do we have types where writing an empty value creates a tombstone? >>> >>> I agree with David that my preference would be for only blob and string >>> like types to support empty. It’s too late for the existing types, but we >>> should hold to this going forward. Which is what I think the idea was in >>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was >>> sad the existing numerics were emptiable, but too late to change, and we >>> could correct it for newer types. >>> On Sep 19, 2023, at 12:12 PM, David Capwell wrote: > > When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started > making types non -emptiable. This approach makes more sense to me as > having to deal with empty value is error prone in my opinion. I agree it’s confusing, and in the patch I found that different code paths didn’t handle things correctly as we have some times (most) that support empty bytes, and some that do not…. Empty also has different meaning in different code paths; for most it means “null”, and for some other types it means “empty”…. To try to make things more clear I added org.apache.cassandra.db.marshal.AbstractType#isNull(V, org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each type can define if empty is null or not. > I also think that it would be good to standardize on one approach to > avoid confusion. I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s say I have a “blob” type and I write an empty byte… what does this mean? What does it mean for "text" type? The fact I get back a null in both those cases was very confusing to me… I do feel that some types should support empty, and the common code of empty == null I think is very brittle (blob/text was not correct in different places due to this...)… so I am cool with removing that relationship, but don’t think we should have a rule blocking empty for all current / future types as it some times does make sense. > empty vector (I presume) for the vector type? Empty vectors (vector[0]) are
Re: [DISCUSS] Vector type and empty value
Allowing zero-length byte arrays for most old types is just a legacy from Darker Days. It’s a distinct concern from columns being nullable or not. There are a couple types where this makes sense: strings and blobs. All else should not allow this except for backward compatibility reasons. So, not for new types. > On 20 Sep 2023, at 00:08, David Capwell wrote: > >> When does empty mean null? > > > Most types are this way > > @Test > public void nullExample() > { > createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); > execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new > byte[0])); > Row result = execute("SELECT * FROM %s WHERE pk=0").one(); > if (result.has("cuteness")) System.out.println("Cuteness score: " + > result.getInt("cuteness")); > else System.out.println("Cuteness score is undefined"); > } > > > This test will NPE in getInt as the returned BB is seen as “null” for int32 > type, you can make it “safer” by changing to the following > > if (result.has("cuteness")) System.out.println("Cuteness score: " + > Int32Type.instance.compose(result.getBlob("cuteness"))); > > Now we get the log "Cuteness score: null” > > What’s even better (just found this out) is that client isn’t consistent or > correct in these cases! > > com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, > "SELECT * FROM %s WHERE pk=0").one(); > if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness > score: " + result.getInt("cuteness")); > else System.out.println("Cuteness score is undefined”); > > This prints "Cuteness score: 0” > > So for Cassandra we think the value is “null” but java driver thinks it’s 0? > >> Do we have types where writing an empty value creates a tombstone? > > Empty does not generate a tombstone for any type, but empty has a similar > user experience as we return null in both cases (but just found out that the > drivers may not be consistent with this…) > >> On Sep 19, 2023, at 3:33 PM, J. D. Jordan wrote: >> >> When does empty mean null? My understanding was that empty is a valid value >> for the types that support it, separate from null (aka a tombstone). Do we >> have types where writing an empty value creates a tombstone? >> >> I agree with David that my preference would be for only blob and string like >> types to support empty. It’s too late for the existing types, but we should >> hold to this going forward. Which is what I think the idea was in >> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was >> sad the existing numerics were emptiable, but too late to change, and we >> could correct it for newer types. >> >>> On Sep 19, 2023, at 12:12 PM, David Capwell wrote: >>> >>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making types non -emptiable. This approach makes more sense to me as having to deal with empty value is error prone in my opinion. >>> >>> I agree it’s confusing, and in the patch I found that different code paths >>> didn’t handle things correctly as we have some times (most) that support >>> empty bytes, and some that do not…. Empty also has different meaning in >>> different code paths; for most it means “null”, and for some other types it >>> means “empty”…. To try to make things more clear I added >>> org.apache.cassandra.db.marshal.AbstractType#isNull(V, >>> org.apache.cassandra.db.marshal.ValueAccessor) to the type system so >>> each type can define if empty is null or not. >>> I also think that it would be good to standardize on one approach to avoid confusion. >>> >>> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s >>> say I have a “blob” type and I write an empty byte… what does this mean? >>> What does it mean for "text" type? The fact I get back a null in both >>> those cases was very confusing to me… I do feel that some types should >>> support empty, and the common code of empty == null I think is very brittle >>> (blob/text was not correct in different places due to this...)… so I am >>> cool with removing that relationship, but don’t think we should have a rule >>> blocking empty for all current / future types as it some times does make >>> sense. >>> empty vector (I presume) for the vector type? >>> >>> Empty vectors (vector[0]) are blocked at the type level, the smallest >>> vector is vector[1] >>> as types that can never be null >>> >>> One pro here is that “null” is cheaper (in some regards) than delete >>> (though we can never purge), but having 2 similar behaviors (write null, do >>> a delete) at the type level is a bit confusing… Right now I am allowed to >>> do the following (the below isn’t valid CQL, its a hybrid of CQL + Java >>> code…) >>> >>> CREATE TABLE fluffykittens (pk int primary key, cuteness int); >>> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0]) >>> >>> CREATE TABLE
Re: [DISCUSS] Vector type and empty value
Yes, if this is what was meant by empty I agree. It’s nonsensical for most types. Apologies for any confusion. I don’t think we can readily migrate old types away from this however, without breaking backwards compatibility. We can only prevent its use in the CQL layer where support isn’t required. My understanding was that we had at least tried to do this for all non-thrift schemas, but perhaps we did not do so thoroughly and now may have some CQL legacy support requirements as well. > On 20 Sep 2023, at 15:30, Aleksey Yeshchenko wrote: > > Allowing zero-length byte arrays for most old types is just a legacy from > Darker Days. It’s a distinct concern from columns being nullable or not. > > There are a couple types where this makes sense: strings and blobs. All else > should not allow this except for backward compatibility reasons. So, not for > new types. > >>> On 20 Sep 2023, at 00:08, David Capwell wrote: >>> >>> When does empty mean null? >> >> >> Most types are this way >> >> @Test >> public void nullExample() >> { >> createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); >> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new >> byte[0])); >> Row result = execute("SELECT * FROM %s WHERE pk=0").one(); >> if (result.has("cuteness")) System.out.println("Cuteness score: " + >> result.getInt("cuteness")); >> else System.out.println("Cuteness score is undefined"); >> } >> >> >> This test will NPE in getInt as the returned BB is seen as “null” for int32 >> type, you can make it “safer” by changing to the following >> >> if (result.has("cuteness")) System.out.println("Cuteness score: " + >> Int32Type.instance.compose(result.getBlob("cuteness"))); >> >> Now we get the log "Cuteness score: null” >> >> What’s even better (just found this out) is that client isn’t consistent or >> correct in these cases! >> >> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, >> "SELECT * FROM %s WHERE pk=0").one(); >> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness >> score: " + result.getInt("cuteness")); >> else System.out.println("Cuteness score is undefined”); >> >> This prints "Cuteness score: 0” >> >> So for Cassandra we think the value is “null” but java driver thinks it’s 0? >> >>> Do we have types where writing an empty value creates a tombstone? >> >> Empty does not generate a tombstone for any type, but empty has a similar >> user experience as we return null in both cases (but just found out that the >> drivers may not be consistent with this…) >> On Sep 19, 2023, at 3:33 PM, J. D. Jordan wrote: >>> >>> When does empty mean null? My understanding was that empty is a valid >>> value for the types that support it, separate from null (aka a tombstone). >>> Do we have types where writing an empty value creates a tombstone? >>> >>> I agree with David that my preference would be for only blob and string >>> like types to support empty. It’s too late for the existing types, but we >>> should hold to this going forward. Which is what I think the idea was in >>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was >>> sad the existing numerics were emptiable, but too late to change, and we >>> could correct it for newer types. >>> On Sep 19, 2023, at 12:12 PM, David Capwell wrote: > > When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started > making types non -emptiable. This approach makes more sense to me as > having to deal with empty value is error prone in my opinion. I agree it’s confusing, and in the patch I found that different code paths didn’t handle things correctly as we have some times (most) that support empty bytes, and some that do not…. Empty also has different meaning in different code paths; for most it means “null”, and for some other types it means “empty”…. To try to make things more clear I added org.apache.cassandra.db.marshal.AbstractType#isNull(V, org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each type can define if empty is null or not. > I also think that it would be good to standardize on one approach to > avoid confusion. I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s say I have a “blob” type and I write an empty byte… what does this mean? What does it mean for "text" type? The fact I get back a null in both those cases was very confusing to me… I do feel that some types should support empty, and the common code of empty == null I think is very brittle (blob/text was not correct in different places due to this...)… so I am cool with removing that relationship, but don’t think we should have a rule blocking empty for all current / future types as it some times does make sense. > empty vector
Re: [DISCUSS] Vector type and empty value
> When does empty mean null? Most types are this way @Test public void nullExample() { createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new byte[0])); Row result = execute("SELECT * FROM %s WHERE pk=0").one(); if (result.has("cuteness")) System.out.println("Cuteness score: " + result.getInt("cuteness")); else System.out.println("Cuteness score is undefined"); } This test will NPE in getInt as the returned BB is seen as “null” for int32 type, you can make it “safer” by changing to the following if (result.has("cuteness")) System.out.println("Cuteness score: " + Int32Type.instance.compose(result.getBlob("cuteness"))); Now we get the log "Cuteness score: null” What’s even better (just found this out) is that client isn’t consistent or correct in these cases! com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, "SELECT * FROM %s WHERE pk=0").one(); if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness score: " + result.getInt("cuteness")); else System.out.println("Cuteness score is undefined”); This prints "Cuteness score: 0” So for Cassandra we think the value is “null” but java driver thinks it’s 0? > Do we have types where writing an empty value creates a tombstone? Empty does not generate a tombstone for any type, but empty has a similar user experience as we return null in both cases (but just found out that the drivers may not be consistent with this…) > On Sep 19, 2023, at 3:33 PM, J. D. Jordan wrote: > > When does empty mean null? My understanding was that empty is a valid value > for the types that support it, separate from null (aka a tombstone). Do we > have types where writing an empty value creates a tombstone? > > I agree with David that my preference would be for only blob and string like > types to support empty. It’s too late for the existing types, but we should > hold to this going forward. Which is what I think the idea was in > https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was > sad the existing numerics were emptiable, but too late to change, and we > could correct it for newer types. > >> On Sep 19, 2023, at 12:12 PM, David Capwell wrote: >> >> >>> >>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making >>> types non -emptiable. This approach makes more sense to me as having to >>> deal with empty value is error prone in my opinion. >> >> I agree it’s confusing, and in the patch I found that different code paths >> didn’t handle things correctly as we have some times (most) that support >> empty bytes, and some that do not…. Empty also has different meaning in >> different code paths; for most it means “null”, and for some other types it >> means “empty”…. To try to make things more clear I added >> org.apache.cassandra.db.marshal.AbstractType#isNull(V, >> org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each >> type can define if empty is null or not. >> >>> I also think that it would be good to standardize on one approach to avoid >>> confusion. >> >> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s >> say I have a “blob” type and I write an empty byte… what does this mean? >> What does it mean for "text" type? The fact I get back a null in both those >> cases was very confusing to me… I do feel that some types should support >> empty, and the common code of empty == null I think is very brittle >> (blob/text was not correct in different places due to this...)… so I am cool >> with removing that relationship, but don’t think we should have a rule >> blocking empty for all current / future types as it some times does make >> sense. >> >>> empty vector (I presume) for the vector type? >> >> Empty vectors (vector[0]) are blocked at the type level, the smallest vector >> is vector[1] >> >>> as types that can never be null >> >> One pro here is that “null” is cheaper (in some regards) than delete (though >> we can never purge), but having 2 similar behaviors (write null, do a >> delete) at the type level is a bit confusing… Right now I am allowed to do >> the following (the below isn’t valid CQL, its a hybrid of CQL + Java code…) >> >> CREATE TABLE fluffykittens (pk int primary key, cuteness int); >> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0]) >> >> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY >> ((pk1, pk2)); >> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new >> byte[0], new byte[0]) — valid as the partition key is not empty as its a >> composite of 2 empty values, this is the same as new byte[2] >> >> The first time I ever found out that empty bytes was valid was when a user >> was trying to abuse this in collections (also the fact collections support >> null in some cases and not others is fun…)…. It was blowing up in
Re: [DISCUSS] Vector type and empty value
When does empty mean null? My understanding was that empty is a valid value for the types that support it, separate from null (aka a tombstone). Do we have types where writing an empty value creates a tombstone? I agree with David that my preference would be for only blob and string like types to support empty. It’s too late for the existing types, but we should hold to this going forward. Which is what I think the idea was in https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was sad the existing numerics were emptiable, but too late to change, and we could correct it for newer types. > On Sep 19, 2023, at 12:12 PM, David Capwell wrote: > > >> >> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making >> types non -emptiable. This approach makes more sense to me as having to deal >> with empty value is error prone in my opinion. > > I agree it’s confusing, and in the patch I found that different code paths > didn’t handle things correctly as we have some times (most) that support > empty bytes, and some that do not…. Empty also has different meaning in > different code paths; for most it means “null”, and for some other types it > means “empty”…. To try to make things more clear I added > org.apache.cassandra.db.marshal.AbstractType#isNull(V, > org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each > type can define if empty is null or not. > >> I also think that it would be good to standardize on one approach to avoid >> confusion. > > I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s > say I have a “blob” type and I write an empty byte… what does this mean? > What does it mean for "text" type? The fact I get back a null in both those > cases was very confusing to me… I do feel that some types should support > empty, and the common code of empty == null I think is very brittle > (blob/text was not correct in different places due to this...)… so I am cool > with removing that relationship, but don’t think we should have a rule > blocking empty for all current / future types as it some times does make > sense. > >> empty vector (I presume) for the vector type? > > Empty vectors (vector[0]) are blocked at the type level, the smallest vector > is vector[1] > >> as types that can never be null > > One pro here is that “null” is cheaper (in some regards) than delete (though > we can never purge), but having 2 similar behaviors (write null, do a delete) > at the type level is a bit confusing… Right now I am allowed to do the > following (the below isn’t valid CQL, its a hybrid of CQL + Java code…) > > CREATE TABLE fluffykittens (pk int primary key, cuteness int); > INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0]) > > CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY ((pk1, > pk2)); > INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new > byte[0], new byte[0]) — valid as the partition key is not empty as its a > composite of 2 empty values, this is the same as new byte[2] > > The first time I ever found out that empty bytes was valid was when a user > was trying to abuse this in collections (also the fact collections support > null in some cases and not others is fun…)…. It was blowing up in random > places… good times! > > I am personally not in favor of allowing empty bytes (other than for blob / > text as that is actually valid for the domain), but having similar types > having different semantics I feel is more problematic... > >>> On Sep 19, 2023, at 8:56 AM, Josh McKenzie wrote: >>> >>> I am strongly in favour of permitting the table definition forbidding nulls >>> - and perhaps even defaulting to this behaviour. But I don’t think we >>> should have types that are inherently incapable of being null. >> I'm with Benedict. Seems like this could help prevent whatever "nulls in >> primary key columns" problems Aleksey was alluding to on those tickets back >> in the day that pushed us towards making the new types non-emptiable as well >> (i.e. primary keys are non-null in table definition). >> >> Furthering Alex' question, having a default value for unset fields in any >> non-collection context seems... quite surprising to me in a database. I >> could see the argument for making container / collection types non-nullable, >> maybe, but that just keeps us in a potential straddle case (some types >> nullable, some not). >> >>> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote: >>> >>> If I understand this suggestion correctly it is a whole can of worms, as >>> types that can never be null prevent us ever supporting outer joins that >>> return these types. >>> >>> I am strongly in favour of permitting the table definition forbidding nulls >>> - and perhaps even defaulting to this behaviour. But I don’t think we >>> should have types that are inherently incapable of being null. I also >>> certainly
Re: [DISCUSS] Vector type and empty value
> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making > types non -emptiable. This approach makes more sense to me as having to deal > with empty value is error prone in my opinion. I agree it’s confusing, and in the patch I found that different code paths didn’t handle things correctly as we have some times (most) that support empty bytes, and some that do not…. Empty also has different meaning in different code paths; for most it means “null”, and for some other types it means “empty”…. To try to make things more clear I added org.apache.cassandra.db.marshal.AbstractType#isNull(V, org.apache.cassandra.db.marshal.ValueAccessor) to the type system so each type can define if empty is null or not. > I also think that it would be good to standardize on one approach to avoid > confusion. I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s say I have a “blob” type and I write an empty byte… what does this mean? What does it mean for "text" type? The fact I get back a null in both those cases was very confusing to me… I do feel that some types should support empty, and the common code of empty == null I think is very brittle (blob/text was not correct in different places due to this...)… so I am cool with removing that relationship, but don’t think we should have a rule blocking empty for all current / future types as it some times does make sense. > empty vector (I presume) for the vector type? Empty vectors (vector[0]) are blocked at the type level, the smallest vector is vector[1] > as types that can never be null One pro here is that “null” is cheaper (in some regards) than delete (though we can never purge), but having 2 similar behaviors (write null, do a delete) at the type level is a bit confusing… Right now I am allowed to do the following (the below isn’t valid CQL, its a hybrid of CQL + Java code…) CREATE TABLE fluffykittens (pk int primary key, cuteness int); INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0]) CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY ((pk1, pk2)); INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new byte[0], new byte[0]) — valid as the partition key is not empty as its a composite of 2 empty values, this is the same as new byte[2] The first time I ever found out that empty bytes was valid was when a user was trying to abuse this in collections (also the fact collections support null in some cases and not others is fun…)…. It was blowing up in random places… good times! I am personally not in favor of allowing empty bytes (other than for blob / text as that is actually valid for the domain), but having similar types having different semantics I feel is more problematic... > On Sep 19, 2023, at 8:56 AM, Josh McKenzie wrote: > >> I am strongly in favour of permitting the table definition forbidding nulls >> - and perhaps even defaulting to this behaviour. But I don’t think we should >> have types that are inherently incapable of being null. > I'm with Benedict. Seems like this could help prevent whatever "nulls in > primary key columns" problems Aleksey was alluding to on those tickets back > in the day that pushed us towards making the new types non-emptiable as well > (i.e. primary keys are non-null in table definition). > > Furthering Alex' question, having a default value for unset fields in any > non-collection context seems... quite surprising to me in a database. I could > see the argument for making container / collection types non-nullable, maybe, > but that just keeps us in a potential straddle case (some types nullable, > some not). > > On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote: >> >> If I understand this suggestion correctly it is a whole can of worms, as >> types that can never be null prevent us ever supporting outer joins that >> return these types. >> >> I am strongly in favour of permitting the table definition forbidding nulls >> - and perhaps even defaulting to this behaviour. But I don’t think we should >> have types that are inherently incapable of being null. I also certainly >> don’t think we should have bifurcated our behaviour between types like this. >> >> >> >>> On 19 Sep 2023, at 11:54, Alex Petrov wrote: >>> >>> To make sure I understand this right; does that mean there will be a >>> default value for unset fields? Like 0 for numerical values, and an empty >>> vector (I presume) for the vector type? >>> >>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote: Hi everybody, I noticed that the new Vector type accepts empty ByteBuffer values as an input representing null. When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making types non -emptiable. This approach makes more sense to me as having to deal with empty value is error prone in my opinion. I also think that it would be good to standardize on one
Re: [DISCUSS] Vector type and empty value
> I am strongly in favour of permitting the table definition forbidding nulls - > and perhaps even defaulting to this behaviour. But I don’t think we should > have types that are inherently incapable of being null. I'm with Benedict. Seems like this could help prevent whatever "nulls in primary key columns" problems Aleksey was alluding to on those tickets back in the day that pushed us towards making the new types non-emptiable as well (i.e. primary keys are non-null in table definition). Furthering Alex' question, having a default value for unset fields in any non-collection context seems... quite surprising to me in a database. I could see the argument for making container / collection types non-nullable, maybe, but that just keeps us in a potential straddle case (some types nullable, some not). On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote: > > If I understand this suggestion correctly it is a whole can of worms, as > types that can never be null prevent us ever supporting outer joins that > return these types. > > I am strongly in favour of permitting the table definition forbidding nulls - > and perhaps even defaulting to this behaviour. But I don’t think we should > have types that are inherently incapable of being null. I also certainly > don’t think we should have bifurcated our behaviour between types like this. > > > >> On 19 Sep 2023, at 11:54, Alex Petrov wrote: >> >> To make sure I understand this right; does that mean there will be a default >> value for unset fields? Like 0 for numerical values, and an empty vector (I >> presume) for the vector type? >> >> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote: >>> Hi everybody, >>> >>> I noticed that the new Vector type accepts empty ByteBuffer values as an >>> input representing null. >>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making >>> types non -emptiable. This approach makes more sense to me as having to >>> deal with empty value is error prone in my opinion. >>> I also think that it would be good to standardize on one approach to avoid >>> confusion. >>> >>> Should we make the Vector type non-emptiable and stick to it for the new >>> types? >>> >>> I like to hear your opinion. >>
Re: [DISCUSS] Vector type and empty value
If I understand this suggestion correctly it is a whole can of worms, as types that can never be null prevent us ever supporting outer joins that return these types. I am strongly in favour of permitting the table definition forbidding nulls - and perhaps even defaulting to this behaviour. But I don’t think we should have types that are inherently incapable of being null. I also certainly don’t think we should have bifurcated our behaviour between types like this. > On 19 Sep 2023, at 11:54, Alex Petrov wrote: > > > To make sure I understand this right; does that mean there will be a default > value for unset fields? Like 0 for numerical values, and an empty vector (I > presume) for the vector type? > >> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote: >> Hi everybody, >> >> I noticed that the new Vector type accepts empty ByteBuffer values as an >> input representing null. >> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making >> types non -emptiable. This approach makes more sense to me as having to deal >> with empty value is error prone in my opinion. >> I also think that it would be good to standardize on one approach to avoid >> confusion. >> >> Should we make the Vector type non-emptiable and stick to it for the new >> types? >> >> I like to hear your opinion. >
Re: [DISCUSS] Vector type and empty value
To make sure I understand this right; does that mean there will be a default value for unset fields? Like 0 for numerical values, and an empty vector (I presume) for the vector type? On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote: > Hi everybody, > > I noticed that the new Vector type accepts empty ByteBuffer values as an > input representing null. > When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making > types non -emptiable. This approach makes more sense to me as having to deal > with empty value is error prone in my opinion. > I also think that it would be good to standardize on one approach to avoid > confusion. > > Should we make the Vector type non-emptiable and stick to it for the new > types? > > I like to hear your opinion.
[DISCUSS] Vector type and empty value
Hi everybody, I noticed that the new Vector type accepts empty ByteBuffer values as an input representing null. When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making types non -emptiable. This approach makes more sense to me as having to deal with empty value is error prone in my opinion. I also think that it would be good to standardize on one approach to avoid confusion. Should we make the Vector type non-emptiable and stick to it for the new types? I like to hear your opinion.