[jira] [Updated] (SPARK-7825) Poor performance in Cross Product due to no combine operations for small files.

2015-09-09 Thread Tang Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tang Yan updated SPARK-7825:

Affects Version/s: (was: 1.3.1)
   (was: 1.2.2)
   (was: 1.2.1)
   (was: 1.3.0)
   (was: 1.2.0)

> Poor performance in Cross Product due to no combine operations for small 
> files.
> ---
>
> Key: SPARK-7825
> URL: https://issues.apache.org/jira/browse/SPARK-7825
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Tang Yan
>
> Dealing with  Cross Product, if one  table has many small files, spark sql 
> has to handle so many tasks which will lead to poor performance, while Hive 
> has a CombineHiveInputFormat which can combine small files to decrease the 
> task  number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7825) Poor performance in Cross Product due to no combine operations for small files.

2015-05-22 Thread Tang Yan (JIRA)
Tang Yan created SPARK-7825:
---

 Summary: Poor performance in Cross Product due to no combine 
operations for small files.
 Key: SPARK-7825
 URL: https://issues.apache.org/jira/browse/SPARK-7825
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0
Reporter: Tang Yan


Dealing with  Cross Product, if one  table has many small files, spark sql has 
to handle so many tasks which will lead to poor performance, while Hive has a 
CombineHiveInputFormat which can combine small files to decrease the task  
number.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org