[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-05 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17159
  
I think it's a good idea to get SparkR `rbind` to match behavior of R 
`data.frame` `rbind`.
We should clearly indicate the difference between SparkR `union` and 
`rbind` then


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-05 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17159
  
@felixcheung  OK, did not know it was by design. It does seem that the 
`union` behavior is similar to R's SQL (in `sqldf`), but as you pointed out, 
the overload method `rbind` is different from base R, which checks  name 
consistency. See examples below. Should I make the change to `rbind`, or leave 
it as is and close this PR? Thanks.

```
df <- data.frame(name = c("Michael", "Andy", "Justin"), age = c(1, 30, 19))
df2 <- df
names(df2)[1] <- "name2"

# 1. SQL
library(sqldf)
query <- "select * from df union all select * from df2"
sqldf(query)

 name age
1 Michael   1
2Andy  30
3  Justin  19
4 Michael   1
5Andy  30
6  Justin  19

# 2. rbind
rbind(df, df2)
Error in match.names(clabs, names(xi)) : 
  names do not match previous names
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17159
  
hmm... this is somewhat by design - `union` could take in 2 DataFrames that 
might not match in column names or type. In that case values in one of the 
DataFrame will be coerced to make things fit
```
>>> d = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
>>> l = spark.createDataFrame([(1, 2)])
>>> d.union(l).head(2)
[Row(age=1, name=u'Alice'), Row(age=1, name=u'2')]

>>> l.dtypes
[('_1', 'bigint'), ('_2', 'bigint')]
>>> d.dtypes
[('age', 'bigint'), ('name', 'string')]
```

Do you see this as something that might be unexpected for R users (in which 
case `rbind` might be the overload to look into) or SQL users (documented as 
equivalent to SQL UNION ALL)?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17159
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17159
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73897/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17159
  
**[Test build #73897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73897/testReport)**
 for PR 17159 at commit 
[`ef84501`](https://github.com/apache/spark/commit/ef8450157fb6c6535f1608899bc3898974ba8454).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17159
  
**[Test build #73897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73897/testReport)**
 for PR 17159 at commit 
[`ef84501`](https://github.com/apache/spark/commit/ef8450157fb6c6535f1608899bc3898974ba8454).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17159
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17159
  
**[Test build #73895 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73895/testReport)**
 for PR 17159 at commit 
[`293dc35`](https://github.com/apache/spark/commit/293dc35fd203c0926aeb1e0b483372eb525aeec3).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17159
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73895/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17159
  
**[Test build #73895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73895/testReport)**
 for PR 17159 at commit 
[`293dc35`](https://github.com/apache/spark/commit/293dc35fd203c0926aeb1e0b483372eb525aeec3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17159
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73888/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17159
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17159
  
**[Test build #73888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73888/testReport)**
 for PR 17159 at commit 
[`7697806`](https://github.com/apache/spark/commit/769780697d81f91e911b5af516c24b8b4291f27d).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17159
  
**[Test build #73888 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73888/testReport)**
 for PR 17159 at commit 
[`7697806`](https://github.com/apache/spark/commit/769780697d81f91e911b5af516c24b8b4291f27d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-03 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17159
  
The current implementation accepts data frames with different schemas. See 
issues below:
```
df <- createDataFrame(data.frame(name = c("Michael", "Andy", "Justin"), age 
= c(1, 30, 19)))
union(df, df[, c(2, 1)])
 name age
1 Michael 1.0
2Andy30.0
3  Justin19.0
4 1.0 Michael
```
@felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org