[jira] [Commented] (SPARK-4968) [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used

Apache Spark (JIRA) Mon, 29 Dec 2014 11:21:36 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260343#comment-14260343
 ]


Apache Spark commented on SPARK-4968:
-------------------------------------

User 'saucam' has created a pull request for this issue:
https://github.com/apache/spark/pull/3830

> [SparkSQL] java.lang.UnsupportedOperationException when hive partition 
> doesn't exist and order by and limit are used
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-4968
>                 URL: https://issues.apache.org/jira/browse/SPARK-4968
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.1
>         Environment: Spark 1.1.1
> scala - 2.10.2
> hive metastore db - pgsql
> OS- Linux
>            Reporter: Shekhar Bansal
>             Fix For: 1.1.1, 1.1.2, 1.2.1
>
>
> Create table with partitions
> run query for partition which doesn't exist and contains order by and limit
> I am running queries in hiveContext
> 1. Create hive table
> create table if not exists testTable (ID1 BIGINT, ID2 BIGINT,Start_Time 
> STRING, End_Time STRING) PARTITIONED BY (Region STRING,Market STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;
> 2. Create data
> 1,2,"2014-11-01","2014-11-02"
> 2,3,"2014-11-01","2014-11-02"
> 3,4,"2014-11-01","2014-11-02"
> 3. Load data in hive
> LOAD DATA LOCAL INPATH '/tmp/input.txt' OVERWRITE INTO TABLE testTable 
> PARTITION (Region="North", market='market1');
> 4. run query
> SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 
> 100;
> Error trace
> java.lang.UnsupportedOperationException: empty collection
>       at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
>       at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
>       at scala.Option.getOrElse(Option.scala:120)
>       at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
>       at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
>       at 
> org.apache.spark.sql.execution.TakeOrdered.executeCollect(basicOperators.scala:171)
>       at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4968) [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used

Reply via email to