[jira] [Commented] (SPARK-21650) Insert into hive partitioned table from spark-sql taking hours to complete

2017-08-07 Thread Madhavi Vaddepalli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116187#comment-16116187
 ] 

Madhavi Vaddepalli commented on SPARK-21650:


Thank you Sean Owen.

-Madhavi.

> Insert into hive partitioned table from spark-sql taking hours to complete
> --
>
> Key: SPARK-21650
> URL: https://issues.apache.org/jira/browse/SPARK-21650
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Linux machines
> Spark version - 1.6.0
> Hive Version - 1.1
> 200- number of executors.
> 3 - number of executor cores.
> 10g - executor and driver memory.
> dynamic allocation enabled.
>Reporter: Madhavi Vaddepalli
>
> We are trying to execute some logic using spark sql:
> Input to program : 7 billion records. (60 gb gzip compressed,text format)
> Output : 7 billion records.(260 gb gzip compressed and partitioned on few 
> columns)
>   output has 1 partitions(it has 1 different combinations 
> of partition columns)
> We are trying to insert this output to a hive table. (text format , gzip 
> compressed)
> All the tasks spawned finished completely in 33 minutes and all the executors 
> are de-commissioned, only driver is active.*It remained in this state without 
> showing any active stage or task in spark UI for about 2.5 hrs. *and 
> completed successfully.
> Please let us know what can be done to improve the performance here.(is it 
> fixed in later versions ?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21650) Insert into hive partitioned table from spark-sql taking hours to complete

2017-08-07 Thread Madhavi Vaddepalli (JIRA)
Madhavi Vaddepalli created SPARK-21650:
--

 Summary: Insert into hive partitioned table from spark-sql taking 
hours to complete
 Key: SPARK-21650
 URL: https://issues.apache.org/jira/browse/SPARK-21650
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
 Environment: Linux machines
Spark version - 1.6.0
Hive Version - 1.1
200- number of executors.
3 - number of executor cores.
10g - executor and driver memory.
dynamic allocation enabled.
Reporter: Madhavi Vaddepalli


We are trying to execute some logic using spark sql:
Input to program : 7 billion records. (60 gb gzip compressed,text format)
Output : 7 billion records.(260 gb gzip compressed and partitioned on few 
columns)
  output has 1 partitions(it has 1 different combinations 
of partition columns)

We are trying to insert this output to a hive table. (text format , gzip 
compressed)
All the tasks spawned finished completely in 33 minutes and all the executors 
are de-commissioned, only driver is active.*It remained in this state without 
showing any active stage or task in spark UI for about 2.5 hrs. *and completed 
successfully.

Please let us know what can be done to improve the performance here.(is it 
fixed in later versions ?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org