[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17159 I think it's a good idea to get SparkR `rbind` to match behavior of R `data.frame` `rbind`. We should clearly indicate the difference between SparkR `union` and `rbind` then --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17159 @felixcheung OK, did not know it was by design. It does seem that the `union` behavior is similar to R's SQL (in `sqldf`), but as you pointed out, the overload method `rbind` is different from base R, which checks name consistency. See examples below. Should I make the change to `rbind`, or leave it as is and close this PR? Thanks. ``` df <- data.frame(name = c("Michael", "Andy", "Justin"), age = c(1, 30, 19)) df2 <- df names(df2)[1] <- "name2" # 1. SQL library(sqldf) query <- "select * from df union all select * from df2" sqldf(query) name age 1 Michael 1 2Andy 30 3 Justin 19 4 Michael 1 5Andy 30 6 Justin 19 # 2. rbind rbind(df, df2) Error in match.names(clabs, names(xi)) : names do not match previous names ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17159 hmm... this is somewhat by design - `union` could take in 2 DataFrames that might not match in column names or type. In that case values in one of the DataFrame will be coerced to make things fit ``` >>> d = spark.createDataFrame([{'name': 'Alice', 'age': 1}]) >>> l = spark.createDataFrame([(1, 2)]) >>> d.union(l).head(2) [Row(age=1, name=u'Alice'), Row(age=1, name=u'2')] >>> l.dtypes [('_1', 'bigint'), ('_2', 'bigint')] >>> d.dtypes [('age', 'bigint'), ('name', 'string')] ``` Do you see this as something that might be unexpected for R users (in which case `rbind` might be the overload to look into) or SQL users (documented as equivalent to SQL UNION ALL)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17159 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73897/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17159 **[Test build #73897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73897/testReport)** for PR 17159 at commit [`ef84501`](https://github.com/apache/spark/commit/ef8450157fb6c6535f1608899bc3898974ba8454). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17159 **[Test build #73897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73897/testReport)** for PR 17159 at commit [`ef84501`](https://github.com/apache/spark/commit/ef8450157fb6c6535f1608899bc3898974ba8454). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17159 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17159 **[Test build #73895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73895/testReport)** for PR 17159 at commit [`293dc35`](https://github.com/apache/spark/commit/293dc35fd203c0926aeb1e0b483372eb525aeec3). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17159 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73895/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17159 **[Test build #73895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73895/testReport)** for PR 17159 at commit [`293dc35`](https://github.com/apache/spark/commit/293dc35fd203c0926aeb1e0b483372eb525aeec3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17159 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73888/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17159 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17159 **[Test build #73888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73888/testReport)** for PR 17159 at commit [`7697806`](https://github.com/apache/spark/commit/769780697d81f91e911b5af516c24b8b4291f27d). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17159 **[Test build #73888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73888/testReport)** for PR 17159 at commit [`7697806`](https://github.com/apache/spark/commit/769780697d81f91e911b5af516c24b8b4291f27d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17159 The current implementation accepts data frames with different schemas. See issues below: ``` df <- createDataFrame(data.frame(name = c("Michael", "Andy", "Justin"), age = c(1, 30, 19))) union(df, df[, c(2, 1)]) name age 1 Michael 1.0 2Andy30.0 3 Justin19.0 4 1.0 Michael ``` @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org