[jira] [Created] (SPARK-5951) Remove unreachable driver memory properties in yarn client mode (YarnClientSchedulerBackend)
Shekhar Bansal created SPARK-5951: - Summary: Remove unreachable driver memory properties in yarn client mode (YarnClientSchedulerBackend) Key: SPARK-5951 URL: https://issues.apache.org/jira/browse/SPARK-5951 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.0 Environment: yarn Reporter: Shekhar Bansal Priority: Trivial Fix For: 1.3.0 In SPARK-4730 warning for deprecated was added and in SPARK-1953 driver memory configs were removed in yarn client mode During integration spark.master.memory and SPARK_MASTER_MEMORY were not removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-5861: -- Description: I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024), which is waste of resources. was:I am using {code}spark.driver.memory=6g{code}, which creates application master of 7g(yarn.scheduler.minimum-allocation-mb=1024), which is waste of resources. [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024), which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-5861: -- Description: I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. was: I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024), which is waste of resources. [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324056#comment-14324056 ] Shekhar Bansal edited comment on SPARK-5861 at 2/17/15 11:14 AM: - Thanks for the quick reply I know all this. I meant yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory was (Author: sb58): Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) Application manager don't need 7g in yarn-client mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324069#comment-14324069 ] Shekhar Bansal commented on SPARK-5861: --- I am submitting my job using spark summit reproducible by spark-submit --master yarn-client --driver-memory 6g --class org.apache.spark.examples.SparkPi spark-examples-1.2.1-hadoop2.4.0.jar [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) Application manager don't need 7g in yarn-client mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-5861: -- Comment: was deleted (was: I am submitting my job using spark summit reproducible by spark-submit --master yarn-client --driver-memory 6g --class org.apache.spark.examples.SparkPi spark-examples-1.2.1-hadoop2.4.0.jar) [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) Application manager don't need 7g in yarn-client mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-5861: -- Comment: was deleted (was: Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory) [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324055#comment-14324055 ] Shekhar Bansal commented on SPARK-5861: --- Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324053#comment-14324053 ] Shekhar Bansal commented on SPARK-5861: --- Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-5861: -- Comment: was deleted (was: Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory) [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324056#comment-14324056 ] Shekhar Bansal commented on SPARK-5861: --- Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-5861: -- Comment: was deleted (was: Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory) [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324054#comment-14324054 ] Shekhar Bansal commented on SPARK-5861: --- Thanks for the quick reply I know all this. I mean yarn-client mode only In org.apache.spark.deploy.yarn.ClientArguments amMemory = driver-memory amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN)) there is no check for spark.master In above case, I think we are wasting 5g memory [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
Shekhar Bansal created SPARK-5861: - Summary: [yarn-client mode] Application master should not use memory = spark.driver.memory Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code}, which creates application master of 7g(yarn.scheduler.minimum-allocation-mb=1024), which is waste of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324070#comment-14324070 ] Shekhar Bansal commented on SPARK-5861: --- I am submitting my job using spark summit reproducible by spark-submit --master yarn-client --driver-memory 6g --class org.apache.spark.examples.SparkPi spark-examples-1.2.1-hadoop2.4.0.jar [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) Application manager don't need 7g in yarn-client mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-5861: -- Description: I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) Application manager don't need 7g in yarn-client mode. was: I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) which is waste of resources. [yarn-client mode] Application master should not use memory = spark.driver.memory - Key: SPARK-5861 URL: https://issues.apache.org/jira/browse/SPARK-5861 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Reporter: Shekhar Bansal Fix For: 1.3.0, 1.2.2 I am using {code}spark.driver.memory=6g{code} which creates application master of 7g (yarn.scheduler.minimum-allocation-mb=1024) Application manager don't need 7g in yarn-client mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5081) Shuffle write increases
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308596#comment-14308596 ] Shekhar Bansal commented on SPARK-5081: --- I faced same problem, moving to lz4 compression did the trick for me. try spark.io.compression.codec=lz4 Shuffle write increases --- Key: SPARK-5081 URL: https://issues.apache.org/jira/browse/SPARK-5081 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.2.0 Reporter: Kevin Jung The size of shuffle write showing in spark web UI is much different when I execute same spark job with same input data in both spark 1.1 and spark 1.2. At sortBy stage, the size of shuffle write is 98.1MB in spark 1.1 but 146.9MB in spark 1.2. I set spark.shuffle.manager option to hash because it's default value is changed but spark 1.2 still writes shuffle output more than spark 1.1. It can increase disk I/O overhead exponentially as the input file gets bigger and it causes the jobs take more time to complete. In the case of about 100GB input, for example, the size of shuffle write is 39.7GB in spark 1.1 but 91.0GB in spark 1.2. spark 1.1 ||Stage Id||Description||Input||Shuffle Read||Shuffle Write|| |9|saveAsTextFile| |1169.4KB| | |12|combineByKey| |1265.4KB|1275.0KB| |6|sortByKey| |1276.5KB| | |8|mapPartitions| |91.0MB|1383.1KB| |4|apply| |89.4MB| | |5|sortBy|155.6MB| |98.1MB| |3|sortBy|155.6MB| | | |1|collect| |2.1MB| | |2|mapValues|155.6MB| |2.2MB| |0|first|184.4KB| | | spark 1.2 ||Stage Id||Description||Input||Shuffle Read||Shuffle Write|| |12|saveAsTextFile| |1170.2KB| | |11|combineByKey| |1264.5KB|1275.0KB| |8|sortByKey| |1273.6KB| | |7|mapPartitions| |134.5MB|1383.1KB| |5|zipWithIndex| |132.5MB| | |4|sortBy|155.6MB| |146.9MB| |3|sortBy|155.6MB| | | |2|collect| |2.0MB| | |1|mapValues|155.6MB| |2.2MB| |0|first|184.4KB| | | -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4968) [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used
Shekhar Bansal created SPARK-4968: - Summary: [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used Key: SPARK-4968 URL: https://issues.apache.org/jira/browse/SPARK-4968 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.1 Environment: Spark 1.1.1 hive metastore db - pgsql OS- Linux Reporter: Shekhar Bansal Fix For: 1.1.2, 1.2.1, 1.1.1 Create table with partitions run query for partition which doesn't exist and contains order by and limit I am running queries in hiveContext 1. Create hive table create table if not exists testTable (ID1 BIGINT, ID2 BIGINT,Start_Time STRING, End_Time STRING) PARTITIONED BY (Region STRING,Market STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 2. Create data 1,2,2014-11-01,2014-11-02 2,3,2014-11-01,2014-11-02 3,4,2014-11-01,2014-11-02 3. Load data in hive LOAD DATA LOCAL INPATH '/tmp/input.txt' OVERWRITE INTO TABLE testTable PARTITION (Region=North, market='market1'); 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) at org.apache.spark.sql.execution.TakeOrdered.executeCollect(basicOperators.scala:171) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4968) [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used
[ https://issues.apache.org/jira/browse/SPARK-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Bansal updated SPARK-4968: -- Environment: Spark 1.1.1 scala - 2.10.2 hive metastore db - pgsql OS- Linux was: Spark 1.1.1 hive metastore db - pgsql OS- Linux [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used Key: SPARK-4968 URL: https://issues.apache.org/jira/browse/SPARK-4968 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.1 Environment: Spark 1.1.1 scala - 2.10.2 hive metastore db - pgsql OS- Linux Reporter: Shekhar Bansal Fix For: 1.1.1, 1.1.2, 1.2.1 Create table with partitions run query for partition which doesn't exist and contains order by and limit I am running queries in hiveContext 1. Create hive table create table if not exists testTable (ID1 BIGINT, ID2 BIGINT,Start_Time STRING, End_Time STRING) PARTITIONED BY (Region STRING,Market STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 2. Create data 1,2,2014-11-01,2014-11-02 2,3,2014-11-01,2014-11-02 3,4,2014-11-01,2014-11-02 3. Load data in hive LOAD DATA LOCAL INPATH '/tmp/input.txt' OVERWRITE INTO TABLE testTable PARTITION (Region=North, market='market1'); 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) at org.apache.spark.sql.execution.TakeOrdered.executeCollect(basicOperators.scala:171) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org