[jira] [Commented] (SPARK-10501) support UUID as an atomic type
[ https://issues.apache.org/jira/browse/SPARK-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559405#comment-15559405 ] Hyukjin Kwon commented on SPARK-10501: -- Ah, it was type not the function. I just rushed the JIRA. Thanks for correcting. > support UUID as an atomic type > -- > > Key: SPARK-10501 > URL: https://issues.apache.org/jira/browse/SPARK-10501 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jon Haddad >Priority: Minor > > It's pretty common to use UUIDs instead of integers in order to avoid > distributed counters. > I've added this, which at least lets me load dataframes that use UUIDs that I > can cast to strings: > {code} > class UUIDType(AtomicType): > pass > _type_mappings[UUID] = UUIDType > _atomic_types.append(UUIDType) > {code} > But if I try to do anything else with the UUIDs, like this: > {code} > ratings.select("userid").distinct().collect() > {code} > I get this pile of fun: > {code} > scala.MatchError: UUIDType (of class > org.apache.spark.sql.cassandra.types.UUIDType$) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10501) support UUID as an atomic type
[ https://issues.apache.org/jira/browse/SPARK-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559271#comment-15559271 ] Russell Spitzer commented on SPARK-10501: - It's not that we need it as a unique identifier. It's already a datatype in the Cassandra database but there is no direct translation to a spark sql type so a conversion to string must be done. In addition TimeUUIDs require a custom non-bytewise comparator so a greater than or less than lexical comparison of them is always incorrect. https://datastax-oss.atlassian.net/browse/SPARKC-405 > support UUID as an atomic type > -- > > Key: SPARK-10501 > URL: https://issues.apache.org/jira/browse/SPARK-10501 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jon Haddad >Priority: Minor > > It's pretty common to use UUIDs instead of integers in order to avoid > distributed counters. > I've added this, which at least lets me load dataframes that use UUIDs that I > can cast to strings: > {code} > class UUIDType(AtomicType): > pass > _type_mappings[UUID] = UUIDType > _atomic_types.append(UUIDType) > {code} > But if I try to do anything else with the UUIDs, like this: > {code} > ratings.select("userid").distinct().collect() > {code} > I get this pile of fun: > {code} > scala.MatchError: UUIDType (of class > org.apache.spark.sql.cassandra.types.UUIDType$) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10501) support UUID as an atomic type
[ https://issues.apache.org/jira/browse/SPARK-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557820#comment-15557820 ] Hyukjin Kwon commented on SPARK-10501: -- Can we use {{monotonically_increasing_id}} instead? > support UUID as an atomic type > -- > > Key: SPARK-10501 > URL: https://issues.apache.org/jira/browse/SPARK-10501 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jon Haddad >Priority: Minor > > It's pretty common to use UUIDs instead of integers in order to avoid > distributed counters. > I've added this, which at least lets me load dataframes that use UUIDs that I > can cast to strings: > {code} > class UUIDType(AtomicType): > pass > _type_mappings[UUID] = UUIDType > _atomic_types.append(UUIDType) > {code} > But if I try to do anything else with the UUIDs, like this: > {code} > ratings.select("userid").distinct().collect() > {code} > I get this pile of fun: > {code} > scala.MatchError: UUIDType (of class > org.apache.spark.sql.cassandra.types.UUIDType$) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10501) support UUID as an atomic type
[ https://issues.apache.org/jira/browse/SPARK-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739228#comment-14739228 ] Alex Liu commented on SPARK-10501: -- Cassandra has InetAddress and BigInteger types. Can we add those as internal type as well? > support UUID as an atomic type > -- > > Key: SPARK-10501 > URL: https://issues.apache.org/jira/browse/SPARK-10501 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jon Haddad >Priority: Minor > > It's pretty common to use UUIDs instead of integers in order to avoid > distributed counters. > I've added this, which at least lets me load dataframes that use UUIDs that I > can cast to strings: > {code} > class UUIDType(AtomicType): > pass > _type_mappings[UUID] = UUIDType > _atomic_types.append(UUIDType) > {code} > But if I try to do anything else with the UUIDs, like this: > {code} > ratings.select("userid").distinct().collect() > {code} > I get this pile of fun: > {code} > scala.MatchError: UUIDType (of class > org.apache.spark.sql.cassandra.types.UUIDType$) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org