[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-06-01 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Fix Version/s: (was: spark-branch)
   1.3.0

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 1.3.0

 Attachments: 10302.patch, HIVE-10302.2-spark.patch, 
 HIVE-10302.3-spark.patch, HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch


 Usually there are multiple cores in a Spark executor, and thus it's possible 
 that multiple map-join tasks can be running in the same executor 
 (concurrently or sequentially). Currently, each task will load its own copy 
 of the small tables for map join into memory, ending up with inefficiency. 
 Ideally, we only load the small tables once and share them among the tasks 
 running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-06-01 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Attachment: 10302.patch

Patch 10302 (without HIVE-) is the result of rebasing with latest master, which 
is actually committed to master.

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: 10302.patch, HIVE-10302.2-spark.patch, 
 HIVE-10302.3-spark.patch, HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch


 Usually there are multiple cores in a Spark executor, and thus it's possible 
 that multiple map-join tasks can be running in the same executor 
 (concurrently or sequentially). Currently, each task will load its own copy 
 of the small tables for map join into memory, ending up with inefficiency. 
 Ideally, we only load the small tables once and share them among the tasks 
 running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-04-24 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10302:
---
Attachment: HIVE-10302.4-spark.patch

Thanks a lot for the discussion and review. Yes, the spark related test 
failures are related. It is because the loaded small tables are cleared after 
the map join. So the table container is empty in the cache.  Attached v4 that 
doesn't clear the small tables in case it is spark and dedicated cluster, when 
the small tables are cached.

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, 
 HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch


 Usually there are multiple cores in a Spark executor, and thus it's possible 
 that multiple map-join tasks can be running in the same executor 
 (concurrently or sequentially). Currently, each task will load its own copy 
 of the small tables for map join into memory, ending up with inefficiency. 
 Ideally, we only load the small tables once and share them among the tasks 
 running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-04-23 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10302:
---
Attachment: HIVE-10302.2-spark.patch

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.2-spark.patch, HIVE-10302.spark-1.patch


 Usually there are multiple cores in a Spark executor, and thus it's possible 
 that multiple map-join tasks can be running in the same executor 
 (concurrently or sequentially). Currently, each task will load its own copy 
 of the small tables for map join into memory, ending up with inefficiency. 
 Ideally, we only load the small tables once and share them among the tasks 
 running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-04-23 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10302:
---
Attachment: HIVE-10302.3-spark.patch

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, 
 HIVE-10302.spark-1.patch


 Usually there are multiple cores in a Spark executor, and thus it's possible 
 that multiple map-join tasks can be running in the same executor 
 (concurrently or sequentially). Currently, each task will load its own copy 
 of the small tables for map join into memory, ending up with inefficiency. 
 Ideally, we only load the small tables once and share them among the tasks 
 running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once[Spark Branch]

2015-04-21 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Summary: Load small tables (for map join) in executor memory only 
once[Spark Branch]  (was: Cache small tables in memory [Spark Branch])

 Load small tables (for map join) in executor memory only once[Spark Branch]
 ---

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.spark-1.patch


 If we can cache small tables in executor memory, we could save some time in 
 loading them from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-04-21 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Summary: Load small tables (for map join) in executor memory only once 
[Spark Branch]  (was: Load small tables (for map join) in executor memory only 
once[Spark Branch])

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.spark-1.patch


 If we can cache small tables in executor memory, we could save some time in 
 loading them from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-04-21 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Description: Usually there are multiple cores in a Spark executor, and thus 
it's possible that multiple map-join tasks can be running in the same executor 
(concurrently or sequentially). Currently, each task will load its own copy of 
the small tables for map join into memory, ending up with inefficiency. 
Ideally, we only load the small tables once and share them among the tasks 
running in that executor.  (was: If we can cache small tables in executor 
memory, we could save some time in loading them from HDFS.)

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.spark-1.patch


 Usually there are multiple cores in a Spark executor, and thus it's possible 
 that multiple map-join tasks can be running in the same executor 
 (concurrently or sequentially). Currently, each task will load its own copy 
 of the small tables for map join into memory, ending up with inefficiency. 
 Ideally, we only load the small tables once and share them among the tasks 
 running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)