spark git commit: [SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR

2017-10-26 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/branch-2.1 3e77b7481 -> aa023fddb


[SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR

## What changes were proposed in this pull request?

This PR proposes to revive `stringsAsFactors` option in collect API, which was 
mistakenly removed in 
https://github.com/apache/spark/commit/71a138cd0e0a14e8426f97877e3b52a562bbd02c.

Simply, it casts `charactor` to `factor` if it meets the condition, 
`stringsAsFactors && is.character(vec)` in primitive type conversion.

## How was this patch tested?

Unit test in `R/pkg/tests/fulltests/test_sparkSQL.R`.

Author: hyukjinkwon 

Closes #19551 from HyukjinKwon/SPARK-17902.

(cherry picked from commit a83d8d5adcb4e0061e43105767242ba9770dda96)
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/aa023fdd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/aa023fdd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/aa023fdd

Branch: refs/heads/branch-2.1
Commit: aa023fddb0abb6cf8ded94ac695ba7b0edb02022
Parents: 3e77b74
Author: hyukjinkwon 
Authored: Thu Oct 26 20:54:36 2017 +0900
Committer: hyukjinkwon 
Committed: Thu Oct 26 20:55:14 2017 +0900

--
 R/pkg/R/DataFrame.R   | 3 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 6 ++
 2 files changed, 9 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/aa023fdd/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index d0f0979..5899fa8 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -1173,6 +1173,9 @@ setMethod("collect",
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
 class(vec) <- PRIMITIVE_TYPES[[colType]]
+if (is.character(vec) && stringsAsFactors) {
+  vec <- as.factor(vec)
+}
 df[[colIndex]] <- vec
   } else {
 df[[colIndex]] <- col

http://git-wip-us.apache.org/repos/asf/spark/blob/aa023fdd/R/pkg/tests/fulltests/test_sparkSQL.R
--
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index fedca67..0b88e47 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -417,6 +417,12 @@ test_that("create DataFrame with different data types", {
   expect_equal(collect(df), data.frame(l, stringsAsFactors = FALSE))
 })
 
+test_that("SPARK-17902: collect() with stringsAsFactors enabled", {
+  df <- suppressWarnings(collect(createDataFrame(iris), stringsAsFactors = 
TRUE))
+  expect_equal(class(iris$Species), class(df$Species))
+  expect_equal(iris$Species, df$Species)
+})
+
 test_that("SPARK-17811: can create DataFrame containing NA as date and time", {
   df <- data.frame(
 id = 1:2,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR

2017-10-26 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/master 3073344a2 -> a83d8d5ad


[SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR

## What changes were proposed in this pull request?

This PR proposes to revive `stringsAsFactors` option in collect API, which was 
mistakenly removed in 
https://github.com/apache/spark/commit/71a138cd0e0a14e8426f97877e3b52a562bbd02c.

Simply, it casts `charactor` to `factor` if it meets the condition, 
`stringsAsFactors && is.character(vec)` in primitive type conversion.

## How was this patch tested?

Unit test in `R/pkg/tests/fulltests/test_sparkSQL.R`.

Author: hyukjinkwon 

Closes #19551 from HyukjinKwon/SPARK-17902.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a83d8d5a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a83d8d5a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a83d8d5a

Branch: refs/heads/master
Commit: a83d8d5adcb4e0061e43105767242ba9770dda96
Parents: 3073344
Author: hyukjinkwon 
Authored: Thu Oct 26 20:54:36 2017 +0900
Committer: hyukjinkwon 
Committed: Thu Oct 26 20:54:36 2017 +0900

--
 R/pkg/R/DataFrame.R   | 3 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 6 ++
 2 files changed, 9 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a83d8d5a/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 176bb3b..aaa3349 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -1191,6 +1191,9 @@ setMethod("collect",
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
 class(vec) <- PRIMITIVE_TYPES[[colType]]
+if (is.character(vec) && stringsAsFactors) {
+  vec <- as.factor(vec)
+}
 df[[colIndex]] <- vec
   } else {
 df[[colIndex]] <- col

http://git-wip-us.apache.org/repos/asf/spark/blob/a83d8d5a/R/pkg/tests/fulltests/test_sparkSQL.R
--
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 4382ef2..0c8118a 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -499,6 +499,12 @@ test_that("create DataFrame with different data types", {
   expect_equal(collect(df), data.frame(l, stringsAsFactors = FALSE))
 })
 
+test_that("SPARK-17902: collect() with stringsAsFactors enabled", {
+  df <- suppressWarnings(collect(createDataFrame(iris), stringsAsFactors = 
TRUE))
+  expect_equal(class(iris$Species), class(df$Species))
+  expect_equal(iris$Species, df$Species)
+})
+
 test_that("SPARK-17811: can create DataFrame containing NA as date and time", {
   df <- data.frame(
 id = 1:2,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR

2017-10-26 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 d2dc175a1 -> 24fe7ccba


[SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR

## What changes were proposed in this pull request?

This PR proposes to revive `stringsAsFactors` option in collect API, which was 
mistakenly removed in 
https://github.com/apache/spark/commit/71a138cd0e0a14e8426f97877e3b52a562bbd02c.

Simply, it casts `charactor` to `factor` if it meets the condition, 
`stringsAsFactors && is.character(vec)` in primitive type conversion.

## How was this patch tested?

Unit test in `R/pkg/tests/fulltests/test_sparkSQL.R`.

Author: hyukjinkwon 

Closes #19551 from HyukjinKwon/SPARK-17902.

(cherry picked from commit a83d8d5adcb4e0061e43105767242ba9770dda96)
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24fe7ccb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24fe7ccb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24fe7ccb

Branch: refs/heads/branch-2.2
Commit: 24fe7ccbacd913c19fa40199fd5511aaf55c6bfa
Parents: d2dc175
Author: hyukjinkwon 
Authored: Thu Oct 26 20:54:36 2017 +0900
Committer: hyukjinkwon 
Committed: Thu Oct 26 20:55:00 2017 +0900

--
 R/pkg/R/DataFrame.R   | 3 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 6 ++
 2 files changed, 9 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/24fe7ccb/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 3859fa8..c0a954d 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -1174,6 +1174,9 @@ setMethod("collect",
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
 class(vec) <- PRIMITIVE_TYPES[[colType]]
+if (is.character(vec) && stringsAsFactors) {
+  vec <- as.factor(vec)
+}
 df[[colIndex]] <- vec
   } else {
 df[[colIndex]] <- col

http://git-wip-us.apache.org/repos/asf/spark/blob/24fe7ccb/R/pkg/tests/fulltests/test_sparkSQL.R
--
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 12d8fef..50c60fe 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -483,6 +483,12 @@ test_that("create DataFrame with different data types", {
   expect_equal(collect(df), data.frame(l, stringsAsFactors = FALSE))
 })
 
+test_that("SPARK-17902: collect() with stringsAsFactors enabled", {
+  df <- suppressWarnings(collect(createDataFrame(iris), stringsAsFactors = 
TRUE))
+  expect_equal(class(iris$Species), class(df$Species))
+  expect_equal(iris$Species, df$Species)
+})
+
 test_that("SPARK-17811: can create DataFrame containing NA as date and time", {
   df <- data.frame(
 id = 1:2,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org