[jira] [Commented] (SPARK-26305) Breakthrough the memory limitation of broadcast join

2018-12-13 Thread Lantao Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720257#comment-16720257
 ] 

Lantao Jin commented on SPARK-26305:


Add a design doc. Not totally completed.

> Breakthrough the memory limitation of broadcast join
> 
>
> Key: SPARK-26305
> URL: https://issues.apache.org/jira/browse/SPARK-26305
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Lantao Jin
>Priority: Major
>
> If the join between a big table and a small one faces data skewing issue, we 
> usually use a broadcast hint in SQL to resolve it. However, current broadcast 
> join has many limitations. The primary restriction is memory. The small table 
> which is broadcasted must be fulfilled to memory in driver/executors side. 
> Although it will spill to disk when the memory is insufficient, it still 
> causes OOM if the small table actually is not absolutely small, it's 
> relatively small. In our company, we have many real big data SQL analysis 
> jobs which handle dozens of hundreds terabytes join and shuffle. For example, 
> the size of large table is 100TB, and the small one is 1 times less, 
> still 10GB. In this case, broadcast join couldn't be finished since the small 
> one is still larger than expected. If the join is data skewing, the sortmerge 
> join always failed.
> Hive has a skew join hint which could trigger two-stage task to handle the 
> skew key and normal key separately. I guess Databricks Runtime has the 
> similar implementation. However, the skew join hint needs SQL users know the 
> data in table like their children. They must know which key is skewing in a 
> join. It's very hard to know since the data is changing day by day and the 
> join key isn't fixed in different queries. The users have to set a huge 
> partition number to try their luck.
> So, do we have a simple, rude and efficient way to resolve it? Back to the 
> limitation, if the broadcasted table no needs to fill to memory, in other 
> words, driver/executor stores the broadcasted table to disk only. The problem 
> mentioned above could be resolved.
> A new hint like BROADCAST_DISK or an additional parameter in original 
> BROADCAST hint will be introduced to cover this case. The original broadcast 
> behavior won’t be changed.
> I will offer a design doc if you have same feeling about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26305) Breakthrough the memory limitation of broadcast join

2018-12-09 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713917#comment-16713917
 ] 

Liang-Chi Hsieh commented on SPARK-26305:
-

I'm working on SPARK-25549, which provides an API to materialize an RDD in 
order to obtain its statistics. Because it actually do materialize of an RDD in 
disk, I think it is related and maybe it can be used by this requirement.

> Breakthrough the memory limitation of broadcast join
> 
>
> Key: SPARK-26305
> URL: https://issues.apache.org/jira/browse/SPARK-26305
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Lantao Jin
>Priority: Major
>
> If the join between a big table and a small one faces data skewing issue, we 
> usually use a broadcast hint in SQL to resolve it. However, current broadcast 
> join has many limitations. The primary restriction is memory. The small table 
> which is broadcasted must be fulfilled to memory in driver/executors side. 
> Although it will spill to disk when the memory is insufficient, it still 
> causes OOM if the small table actually is not absolutely small, it's 
> relatively small. In our company, we have many real big data SQL analysis 
> jobs which handle dozens of hundreds terabytes join and shuffle. For example, 
> the size of large table is 100TB, and the small one is 1 times less, 
> still 10GB. In this case, broadcast join couldn't be finished since the small 
> one is still larger than expected. If the join is data skewing, the sortmerge 
> join always failed.
> Hive has a skew join hint which could trigger two-stage task to handle the 
> skew key and normal key separately. I guess Databricks Runtime has the 
> similar implementation. However, the skew join hint needs SQL users know the 
> data in table like their children. They must know which key is skewing in a 
> join. It's very hard to know since the data is changing day by day and the 
> join key isn't fixed in different queries. The users have to set a huge 
> partition number to try their luck.
> So, do we have a simple, rude and efficient way to resolve it? Back to the 
> limitation, if the broadcasted table no needs to fill to memory, in other 
> words, driver/executor stores the broadcasted table to disk only. The problem 
> mentioned above could be resolved.
> A new hint like BROADCAST_DISK or an additional parameter in original 
> BROADCAST hint will be introduced to cover this case. The original broadcast 
> behavior won’t be changed.
> I will offer a design doc if you have same feeling about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26305) Breakthrough the memory limitation of broadcast join

2018-12-07 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712991#comment-16712991
 ] 

Dongjoon Hyun commented on SPARK-26305:
---

+1 for the idea.

> Breakthrough the memory limitation of broadcast join
> 
>
> Key: SPARK-26305
> URL: https://issues.apache.org/jira/browse/SPARK-26305
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Lantao Jin
>Priority: Major
>
> If the join between a big table and a small one faces data skewing issue, we 
> usually use a broadcast hint in SQL to resolve it. However, current broadcast 
> join has many limitations. The primary restriction is memory. The small table 
> which is broadcasted must be fulfilled to memory in driver/executors side. 
> Although it will spill to disk when the memory is insufficient, it still 
> causes OOM if the small table actually is not absolutely small, it's 
> relatively small. In our company, we have many real big data SQL analysis 
> jobs which handle dozens of hundreds terabytes join and shuffle. For example, 
> the size of large table is 100TB, and the small one is 1 times less, 
> still 10GB. In this case, broadcast join couldn't be finished since the small 
> one is still larger than expected. If the join is data skewing, the sortmerge 
> join always failed.
> Hive has a skew join hint which could trigger two-stage task to handle the 
> skew key and normal key separately. I guess Databricks Runtime has the 
> similar implementation. However, the skew join hint needs SQL users know the 
> data in table like their children. They must know which key is skewing in a 
> join. It's very hard to know since the data is changing day by day and the 
> join key isn't fixed in different queries. The users have to set a huge 
> partition number to try their luck.
> So, do we have a simple, rude and efficient way to resolve it? Back to the 
> limitation, if the broadcasted table no needs to fill to memory, in other 
> words, driver/executor stores the broadcasted table to disk only. The problem 
> mentioned above could be resolved.
> A new hint like BROADCAST_DISK or an additional parameter in original 
> BROADCAST hint will be introduced to cover this case. The original broadcast 
> behavior won’t be changed.
> I will offer a design doc if you have same feeling about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26305) Breakthrough the memory limitation of broadcast join

2018-12-07 Thread Lantao Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712855#comment-16712855
 ] 

Lantao Jin commented on SPARK-26305:


CC [~jiangxb1987]  [~cloud_fan] [~dongjoon] [~hyukjin.kwon], thoughts?

> Breakthrough the memory limitation of broadcast join
> 
>
> Key: SPARK-26305
> URL: https://issues.apache.org/jira/browse/SPARK-26305
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Lantao Jin
>Priority: Major
>
> If the join between a big table and a small one faces data skewing issue, we 
> usually use a broadcast hint in SQL to resolve it. However, current broadcast 
> join has many limitations. The primary restriction is memory. The small table 
> which is broadcasted must be fulfilled to memory in driver/executors side. 
> Although it will spill to disk when the memory is insufficient, it still 
> causes OOM if the small table actually is not absolutely small, it's 
> relatively small. In our company, we have many real big data SQL analysis 
> jobs which handle dozens of hundreds terabytes join and shuffle. For example, 
> the size of large table is 100TB, and the small one is 1 times less, 
> still 10GB. In this case, broadcast join couldn't be finished since the small 
> one is still larger than expected. If the join is data skewing, the sortmerge 
> join always failed.
> Hive has a skew join hint which could trigger two-stage task to handle the 
> skew key and normal key separately. I guess Databricks Runtime has the 
> similar implementation. However, the skew join hint needs SQL users know the 
> data in table like their children. They must know which key is skewing in a 
> join. It's very hard to know since the data is changing day by day and the 
> join key isn't fixed in different queries. The users have to set a huge 
> partition number to try their luck.
> So, do we have a simple, rude and efficient way to resolve it? Back to the 
> limitation, if the broadcasted table no needs to fill to memory, in other 
> words, driver/executor stores the broadcasted table to disk only. The problem 
> mentioned above could be resolved.
> I will offer a design doc if you have same feeling about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org