[jira] [Assigned] (SPARK-19618) Inconsistency wrt max. buckets allowed from Dataframe API vs SQL
[ https://issues.apache.org/jira/browse/SPARK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-19618: --- Assignee: Tejas Patil > Inconsistency wrt max. buckets allowed from Dataframe API vs SQL > > > Key: SPARK-19618 > URL: https://issues.apache.org/jira/browse/SPARK-19618 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Tejas Patil >Assignee: Tejas Patil > Fix For: 2.2.0 > > > High number of buckets is allowed while creating a table via SQL query: > {code} > sparkSession.sql(""" > CREATE TABLE bucketed_table(col1 INT) USING parquet > CLUSTERED BY (col1) SORTED BY (col1) INTO 147483647 BUCKETS > """) > sparkSession.sql("DESC FORMATTED bucketed_table").collect.foreach(println) > > [Num Buckets:,147483647,] > [Bucket Columns:,[col1],] > [Sort Columns:,[col1],] > > {code} > Trying the same via dataframe API does not work: > {code} > > df.write.format("orc").bucketBy(147483647, > > "j","k").sortBy("j","k").saveAsTable("bucketed_table") > java.lang.IllegalArgumentException: requirement failed: Bucket number must be > greater than 0 and less than 10. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:293) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:291) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.DataFrameWriter.getBucketSpec(DataFrameWriter.scala:291) > at > org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:429) > at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:410) > at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:365) > ... 50 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19618) Inconsistency wrt max. buckets allowed from Dataframe API vs SQL
[ https://issues.apache.org/jira/browse/SPARK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19618: Assignee: (was: Apache Spark) > Inconsistency wrt max. buckets allowed from Dataframe API vs SQL > > > Key: SPARK-19618 > URL: https://issues.apache.org/jira/browse/SPARK-19618 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Tejas Patil > > High number of buckets is allowed while creating a table via SQL query: > {code} > sparkSession.sql(""" > CREATE TABLE bucketed_table(col1 INT) USING parquet > CLUSTERED BY (col1) SORTED BY (col1) INTO 147483647 BUCKETS > """) > sparkSession.sql("DESC FORMATTED bucketed_table").collect.foreach(println) > > [Num Buckets:,147483647,] > [Bucket Columns:,[col1],] > [Sort Columns:,[col1],] > > {code} > Trying the same via dataframe API does not work: > {code} > > df.write.format("orc").bucketBy(147483647, > > "j","k").sortBy("j","k").saveAsTable("bucketed_table") > java.lang.IllegalArgumentException: requirement failed: Bucket number must be > greater than 0 and less than 10. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:293) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:291) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.DataFrameWriter.getBucketSpec(DataFrameWriter.scala:291) > at > org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:429) > at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:410) > at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:365) > ... 50 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19618) Inconsistency wrt max. buckets allowed from Dataframe API vs SQL
[ https://issues.apache.org/jira/browse/SPARK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19618: Assignee: Apache Spark > Inconsistency wrt max. buckets allowed from Dataframe API vs SQL > > > Key: SPARK-19618 > URL: https://issues.apache.org/jira/browse/SPARK-19618 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Tejas Patil >Assignee: Apache Spark > > High number of buckets is allowed while creating a table via SQL query: > {code} > sparkSession.sql(""" > CREATE TABLE bucketed_table(col1 INT) USING parquet > CLUSTERED BY (col1) SORTED BY (col1) INTO 147483647 BUCKETS > """) > sparkSession.sql("DESC FORMATTED bucketed_table").collect.foreach(println) > > [Num Buckets:,147483647,] > [Bucket Columns:,[col1],] > [Sort Columns:,[col1],] > > {code} > Trying the same via dataframe API does not work: > {code} > > df.write.format("orc").bucketBy(147483647, > > "j","k").sortBy("j","k").saveAsTable("bucketed_table") > java.lang.IllegalArgumentException: requirement failed: Bucket number must be > greater than 0 and less than 10. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:293) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:291) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.DataFrameWriter.getBucketSpec(DataFrameWriter.scala:291) > at > org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:429) > at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:410) > at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:365) > ... 50 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org