[jira] [Created] (SPARK-13735) Log for parquet relation reading files is too verbose

2016-03-07 Thread Zhong Wang (JIRA)
Zhong Wang created SPARK-13735:
--

 Summary: Log for parquet relation reading files is too verbose
 Key: SPARK-13735
 URL: https://issues.apache.org/jira/browse/SPARK-13735
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.0
Reporter: Zhong Wang
Priority: Trivial


The INFO level logging contains all files read by Parquet Relation, which is 
way too verbose if the input contains lots of files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13704) TaskSchedulerImpl.createTaskSetManager can be expensive, and result in lost executors due to blocked heartbeats

2016-03-05 Thread Zhong Wang (JIRA)
Zhong Wang created SPARK-13704:
--

 Summary: TaskSchedulerImpl.createTaskSetManager can be expensive, 
and result in lost executors due to blocked heartbeats
 Key: SPARK-13704
 URL: https://issues.apache.org/jira/browse/SPARK-13704
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.6.0, 1.5.2, 1.4.1, 1.3.1
Reporter: Zhong Wang


In some cases, TaskSchedulerImpl.createTaskSetManager can be expensive. For 
example, in a Yarn cluster, it may call the topology script for rack awareness. 
When submit a very large job in a very large Yarn cluster, the topology script 
may take signifiant time to run. And this blocks receiving executors' 
heartbeats, which may result in lost executors

Stacktraces we observed which is related to this issue:
{code}
"dag-scheduler-event-loop" daemon prio=10 tid=0x7f8392875800 nid=0x26e8 
runnable [0x7f83576f4000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0xf551f460> (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0xf5529740> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.read1(BufferedReader.java:205)
at java.io.BufferedReader.read(BufferedReader.java:279)
- locked <0xf5529740> (a java.io.InputStreamReader)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:728)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at 
org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at 
org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at 
org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at 
org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at 
org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at 
org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:210)
at 
org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:189)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$addPendingTask(TaskSetManager.scala:189)
at 
org.apache.spark.scheduler.TaskSetManager$$anonfun$1.apply$mcVI$sp(TaskSetManager.scala:158)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at 
org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:157)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:187)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:161)
- locked <0xea3b8a88> (a 
org.apache.spark.scheduler.cluster.YarnScheduler)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:872)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778)
at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

"sparkDriver-akka.actor.default-dispatcher-15" daemon prio=10 
tid=0x7f829c02 nid=0x2737 waiting for monitor entry [0x7f8355ebd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.spark.scheduler.TaskSchedule

[jira] [Commented] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-03-03 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179339#comment-15179339
 ] 

Zhong Wang commented on SPARK-13337:


The current join method with usingColumns argument generates result like 
TableC. The limitation is that it doesn't support null-safe join.

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-03-01 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175154#comment-15175154
 ] 

Zhong Wang edited comment on SPARK-13337 at 3/2/16 6:50 AM:


suppose we are joining two tables:
--
TableA
||key1||key2||value1||
|null|k1|v1|
|k2|k3|v2|

TableB
||key1||key2||value2||
|null|k1|v3|
|k4|k5|v4|

The result table I want is:
--
TableC
||key1||key2||value1||value2||
|null|k1|v1|v3|
|k2|k3|v2|null|
|k4|k5|null|v4|

We cannot use the current join-using-columns interface, because it doesn't 
support null-safe joins, and we have null values in the first row

We cannot use join-select with explicit "<=>" neither, because the output table 
will be like:
--
||df1.key1||df1.key2||df2.key1||df2.key2||value1||value2||
|null|k1|null|k1|v1|v3|
|k2|k3|null|null|v2|null|
|null|null|k4|k5|null|v4|

it is difficult to get the result like TableC using select cause, because the 
null values from outer join (row 2 & 3) can be in both df1.* columns and df2.* 
columns

Hope this makes sense to you. I'd like to submit a pr if this is a real use case


was (Author: zwang):
suppose we have two tables:
--
TableA
||key1||key2||value1||
|null|k1|v1|
|k2|k3|v2|

TableB
||key1||key2||value2||
|null|k1|v3|
|k4|k5|v4|

The result table I want is:
--
TableC
||key1||key2||value1||value2||
|null|k1|v1|v3|
|k2|k3|v2|null|
|k4|k5|null|v4|

We cannot use the current join-using-columns interface, because it doesn't 
support null-safe joins, and we have null values in the first row

We cannot use join-select with explicit "<=>" neither, because the output table 
will be like:
--
||df1.key1||df1.key2||df2.key1||df2.key2||value1||value2||
|null|k1|null|k1|v1|v3|
|k2|k3|null|null|v2|null|
|null|null|k4|k5|null|v4|

it is difficult to get the result like TableC using select cause, because the 
null values from outer join (row 2 & 3) can be in both df1.* columns and df2.* 
columns

Hope this makes sense to you. I'd like to submit a pr if this is a real use case

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-03-01 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175154#comment-15175154
 ] 

Zhong Wang commented on SPARK-13337:


suppose we have two tables:
--
TableA
||key1||key2||value1||
|null|k1|v1|
|k2|k3|v2|

TableB
||key1||key2||value2||
|null|k1|v3|
|k4|k5|v4|

The result table I want is:
--
TableC
||key1||key2||value1||value2||
|null|k1|v1|v3|
|k2|k3|v2|null|
|k4|k5|null|v4|

We cannot use the current join-using-columns interface, because it doesn't 
support null-safe joins, and we have null values in the first row

We cannot use join-select with explicit "<=>" neither, because the output table 
will be like:
--
||df1.key1||df1.key2||df2.key1||df2.key2||value1||value2||
|null|k1|null|k1|v1|v3|
|k2|k3|null|null|v2|null|
null|null|k4|k5|null|v4|

it is difficult to get the result like TableC using select cause, because the 
null values from outer join (row 2 & 3) can be in both df1.* columns and df2.* 
columns

Hope this makes sense to you. I'd like to submit a pr if this is a real use case

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-03-01 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175154#comment-15175154
 ] 

Zhong Wang edited comment on SPARK-13337 at 3/2/16 6:50 AM:


suppose we have two tables:
--
TableA
||key1||key2||value1||
|null|k1|v1|
|k2|k3|v2|

TableB
||key1||key2||value2||
|null|k1|v3|
|k4|k5|v4|

The result table I want is:
--
TableC
||key1||key2||value1||value2||
|null|k1|v1|v3|
|k2|k3|v2|null|
|k4|k5|null|v4|

We cannot use the current join-using-columns interface, because it doesn't 
support null-safe joins, and we have null values in the first row

We cannot use join-select with explicit "<=>" neither, because the output table 
will be like:
--
||df1.key1||df1.key2||df2.key1||df2.key2||value1||value2||
|null|k1|null|k1|v1|v3|
|k2|k3|null|null|v2|null|
|null|null|k4|k5|null|v4|

it is difficult to get the result like TableC using select cause, because the 
null values from outer join (row 2 & 3) can be in both df1.* columns and df2.* 
columns

Hope this makes sense to you. I'd like to submit a pr if this is a real use case


was (Author: zwang):
suppose we have two tables:
--
TableA
||key1||key2||value1||
|null|k1|v1|
|k2|k3|v2|

TableB
||key1||key2||value2||
|null|k1|v3|
|k4|k5|v4|

The result table I want is:
--
TableC
||key1||key2||value1||value2||
|null|k1|v1|v3|
|k2|k3|v2|null|
|k4|k5|null|v4|

We cannot use the current join-using-columns interface, because it doesn't 
support null-safe joins, and we have null values in the first row

We cannot use join-select with explicit "<=>" neither, because the output table 
will be like:
--
||df1.key1||df1.key2||df2.key1||df2.key2||value1||value2||
|null|k1|null|k1|v1|v3|
|k2|k3|null|null|v2|null|
null|null|k4|k5|null|v4|

it is difficult to get the result like TableC using select cause, because the 
null values from outer join (row 2 & 3) can be in both df1.* columns and df2.* 
columns

Hope this makes sense to you. I'd like to submit a pr if this is a real use case

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-02-29 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172881#comment-15172881
 ] 

Zhong Wang edited comment on SPARK-13337 at 2/29/16 11:40 PM:
--

It doesn't help in my case, because it doesn't support null-safe joins. It 
would be great if there is an interface like:

{code}
def join(right: DataFrame, usingColumns: Seq[String], joinType: String, 
nullSafe:Boolean): DataFrame
{code}

The current join-using-column interface works great if the joining tables 
doesn't contain null values: it can eliminate the null columns generated from 
outer joins automatically. The general joining methods in your example support 
null-safe joins perfectly, but it cannot automatically eliminate the null 
columns, which are generated from outer joins.

Sorry that it is a little bit complicated here. Please let me know if you need 
a concrete example.


was (Author: zwang):
It doesn't help in my case, because it doesn't support null-safe joins. It 
would be great if there is an interface like:

{code}
def join(right: DataFrame, usingColumns: Seq[String], joinType: String, 
nullSafe:Boolean): DataFrame
{code}

It works great if the joining tables doesn't contain null values: it can 
eliminate the null columns generated from outer joins automatically. The 
general joining methods in your example support null-safe joins perfectly, but 
it cannot automatically eliminate the null columns, which are generated from 
outer joins.

Sorry that it is a little bit complicated here. Please let me know if you need 
a concrete example.

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-02-29 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172881#comment-15172881
 ] 

Zhong Wang commented on SPARK-13337:


It doesn't help in my case, because it doesn't support null-safe joins. It 
would be great if there is an interface like:

{code}
def join(right: DataFrame, usingColumns: Seq[String], joinType: String, 
nullSafe:Boolean): DataFrame
{code}

It works great if the joining tables doesn't contain null values: it can 
eliminate the null columns generated from outer joins automatically. The 
general joining methods in your example support null-safe joins perfectly, but 
it cannot automatically eliminate the null columns, which are generated from 
outer joins.

Sorry that it is a little bit complicated here. Please let me know if you need 
a concrete example.

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-02-29 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172709#comment-15172709
 ] 

Zhong Wang edited comment on SPARK-13337 at 2/29/16 10:05 PM:
--

For an outer join, it is difficult to eliminate the null columns from the 
result, because the null columns can come from both tables. The 
`join-using-column` interface can automatically eliminate those columns, which 
are very convenient. Sorry that I missed this point in my last reply.


was (Author: zwang):
For an outer join, it is difficult to eliminate the null columns from the 
result. The `join-using-column` interface can automatically eliminate those 
columns, which are very convenient. Sorry that I missed this point in my last 
reply.

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-02-29 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172709#comment-15172709
 ] 

Zhong Wang commented on SPARK-13337:


For an outer join, it is difficult to eliminate the null columns from the 
result. The `join-using-column` interface can automatically eliminate those 
columns, which are very convenient. Sorry that I missed this point in my last 
reply.

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-02-21 Thread Zhong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156579#comment-15156579
 ] 

Zhong Wang commented on SPARK-13337:


Unfortunately no... I use the join-on-columns function to performs a natural 
join. It can eliminate the redundant columns in the resulting table, which is 
required by our use case

> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-02-16 Thread Zhong Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Wang updated SPARK-13337:
---
Description: 
Currently, the join-on-columns function:
{code}
def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
DataFrame
{code}
performs a null-insafe join. It would be great if there is an option for 
null-safe join.

  was:
Currently, the join-on-columns function:

def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
DataFrame

performs a null-insafe join. It would be great if there is an option for 
null-safe join.


> DataFrame join-on-columns function should support null-safe equal
> -
>
> Key: SPARK-13337
> URL: https://issues.apache.org/jira/browse/SPARK-13337
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Zhong Wang
>Priority: Minor
>
> Currently, the join-on-columns function:
> {code}
> def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> performs a null-insafe join. It would be great if there is an option for 
> null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13337) DataFrame join-on-columns function should support null-safe equal

2016-02-16 Thread Zhong Wang (JIRA)
Zhong Wang created SPARK-13337:
--

 Summary: DataFrame join-on-columns function should support 
null-safe equal
 Key: SPARK-13337
 URL: https://issues.apache.org/jira/browse/SPARK-13337
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.6.0
Reporter: Zhong Wang
Priority: Minor


Currently, the join-on-columns function:

def join(right: DataFrame, usingColumns: Seq[String], joinType: String): 
DataFrame

performs a null-insafe join. It would be great if there is an option for 
null-safe join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org