[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r82450472
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -341,11 +342,13 @@ setMethod("toDF", signature(x = "RDD"),
 #' @name read.json
 #' @method read.json default
 #' @note read.json since 1.6.0
-read.json.default <- function(path) {
+read.json.default <- function(path, ...) {
   sparkSession <- getSparkSession()
+  options <- varargsToStrEnv(...)
   # Allow the user to have a more flexible definiton of the text file path
   paths <- as.list(suppressWarnings(normalizePath(path)))
   read <- callJMethod(sparkSession, "read")
+  read <- callJMethod(read, "options", options)
--- End diff --

Yeap, let me try to organise the unsolved comments here and 
https://github.com/apache/spark/issues/15231 if there is any! Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15239


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r82445435
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -341,11 +342,13 @@ setMethod("toDF", signature(x = "RDD"),
 #' @name read.json
 #' @method read.json default
 #' @note read.json since 1.6.0
-read.json.default <- function(path) {
+read.json.default <- function(path, ...) {
   sparkSession <- getSparkSession()
+  options <- varargsToStrEnv(...)
   # Allow the user to have a more flexible definiton of the text file path
   paths <- as.list(suppressWarnings(normalizePath(path)))
   read <- callJMethod(sparkSession, "read")
+  read <- callJMethod(read, "options", options)
--- End diff --

ah, thank you for the very detailed analysis and tests.
I think generally it would be great to match the scala/python behavior (but 
not only because to match it) for read to include all path(s).

```
> read.json(path = "hyukjin.json", path = "felix.json")
Error in dispatchFunc("read.json(path)", x, ...) :
  argument "x" is missing, with no default
```
This is because of the parameter hack.

```
> read.df(path = "hyukjin.json", path = "felix.json", source = "json")
Error in f(x, ...) :
  formal argument "path" matched by multiple actual arguments
```
Think read.df is unique somewhat in the sense the first parameter is named 
`path` - this is both helpful (if we don't want to support multiple path like 
this) or bad (user can't specify multiple paths)

```
> varargsToStrEnv("a", 1, 2, 3)

```
This case is somewhat dangerous - I think we end by passing a list of 
properties without name to the JVM side - it might be a good idea to check for 
`zero-length variable name` - perhaps could you open a JIRA on that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r82367394
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -341,11 +342,13 @@ setMethod("toDF", signature(x = "RDD"),
 #' @name read.json
 #' @method read.json default
 #' @note read.json since 1.6.0
-read.json.default <- function(path) {
+read.json.default <- function(path, ...) {
   sparkSession <- getSparkSession()
+  options <- varargsToStrEnv(...)
   # Allow the user to have a more flexible definiton of the text file path
   paths <- as.list(suppressWarnings(normalizePath(path)))
   read <- callJMethod(sparkSession, "read")
+  read <- callJMethod(read, "options", options)
--- End diff --

Oh, I see.

In R, it seems we can't duplicatedly set paths.

In Scala and Python, for reading it takes all paths set in option and as 
arguments for reading. For writing, the argument overwrites the path set in 
option.


For R, in more details, It seems we can't simply specify the same keyword 
argument both.

- `read.json()`

  **Duplicated keywords**

  ```r
  > read.json(path = "hyukjin.json", path = "felix.json")
  Error in dispatchFunc("read.json(path)", x, ...) :
argument "x" is missing, with no default
  ```

  **With a single keyword argument**

  ```
  > collect(read.json("hyukjin.json", path = "felix.json"))
 NAME
  1 Felix
  ```

- `read.df()`

  **Duplicated keywords**

  ```r
  > read.df(path = "hyukjin.json", path = "felix.json", source = "json")
  Error in f(x, ...) :
formal argument "path" matched by multiple actual arguments
  ```

  **With a single keyword argument**

  ```r
  > read.df("hyukjin.json", path = "felix.json", source = "json")
  Error: length(x) > 0 is not TRUE
  ```

  This case, it seems "hyukjin.json" became the third argument, `schema`.


In the case of **With a single keyword argument**, it seems `path` becomes 
`felix.json`. For example, as below:
```r
> tmp <- function(path, ...) {
+ print(path)
+ }
>
> tmp("a", path = "b")
[1] "b"
```

For `...` arguments, it seems it throws an exception when we use some 
variables mix-and-matched as below:

```r
> varargsToStrEnv("a", path="b")
Error in env[[name]] <- value :
  attempt to use zero-length variable name"
```

However, it seems fine if they are all non-keywords arguments or keywords 
arguments as below:

```r
> varargsToStrEnv("a", 1, 2, 3)
> varargsToStrEnv(a="a", b=1, c=2, d=3)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r82362700
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -341,11 +342,13 @@ setMethod("toDF", signature(x = "RDD"),
 #' @name read.json
 #' @method read.json default
 #' @note read.json since 1.6.0
-read.json.default <- function(path) {
+read.json.default <- function(path, ...) {
   sparkSession <- getSparkSession()
+  options <- varargsToStrEnv(...)
   # Allow the user to have a more flexible definiton of the text file path
   paths <- as.list(suppressWarnings(normalizePath(path)))
   read <- callJMethod(sparkSession, "read")
+  read <- callJMethod(read, "options", options)
--- End diff --

Let me test this first and will try to show together!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r82355751
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -341,11 +342,13 @@ setMethod("toDF", signature(x = "RDD"),
 #' @name read.json
 #' @method read.json default
 #' @note read.json since 1.6.0
-read.json.default <- function(path) {
+read.json.default <- function(path, ...) {
   sparkSession <- getSparkSession()
+  options <- varargsToStrEnv(...)
   # Allow the user to have a more flexible definiton of the text file path
   paths <- as.list(suppressWarnings(normalizePath(path)))
   read <- callJMethod(sparkSession, "read")
+  read <- callJMethod(read, "options", options)
--- End diff --

Oh, I haven't tested this yet but the first `path` will have higher 
precedence because in `setWriteOptions`,

```
options <- varargsToStrEnv(...)
if (!is.null(path)) {
  options[["path"]] <- path
}
```

We are setting the first `path` later overwriting the existing path.

It seems this way is the same in Scala as well.

```
  def load(path: String): DataFrame = {
option("path", path).load(Seq.empty: _*) // force invocation of 
`load(...varargs...)`
  }
```

It overwrites the path in the `option` when it actually loads. Let me test 
this and leave another comment in R.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r82352445
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -341,11 +342,13 @@ setMethod("toDF", signature(x = "RDD"),
 #' @name read.json
 #' @method read.json default
 #' @note read.json since 1.6.0
-read.json.default <- function(path) {
+read.json.default <- function(path, ...) {
   sparkSession <- getSparkSession()
+  options <- varargsToStrEnv(...)
   # Allow the user to have a more flexible definiton of the text file path
   paths <- as.list(suppressWarnings(normalizePath(path)))
   read <- callJMethod(sparkSession, "read")
+  read <- callJMethod(read, "options", options)
--- End diff --

This might be an existing problem but would go down a different code path 
in data sources depending on their implementation...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r82333423
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -841,8 +865,9 @@ setMethod("saveAsParquetFile",
 #' @note write.text since 2.0.0
 setMethod("write.text",
   signature(x = "SparkDataFrame", path = "character"),
-  function(x, path) {
+  function(x, path, mode = "error", ...) {
 write <- callJMethod(x@sdf, "write")
+write <- setWriteOptions(write, mode = mode, ...)
--- End diff --

I guess similarly here, what if someone calls
```
write.text(df, path = "a", path = "b")
```
?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r81672198
  
--- Diff: R/pkg/R/generics.R ---
@@ -651,23 +651,25 @@ setGeneric("write.jdbc", function(x, url, tableName, 
mode = "error", ...) {
 
 #' @rdname write.json
 #' @export
-setGeneric("write.json", function(x, path) { standardGeneric("write.json") 
})
+setGeneric("write.json", function(x, path, mode = NULL, ...) { 
standardGeneric("write.json") })
--- End diff --

Oh, yes, sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r81602120
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -55,6 +55,19 @@ setMethod("initialize", "SparkDataFrame", 
function(.Object, sdf, isCached) {
   .Object
 })
 
+#' Set options/mode and then return the write object
+#' @noRd
+setWriteOptions <- function(write, path = NULL, mode = 'error', ...) {
+options <- varargsToStrEnv(...)
+if (!is.null(path)) {
+  options[["path"]] <- path
+}
+jmode <- convertToJSaveMode(mode)
+write <- callJMethod(write, "mode", jmode)
+write <- callJMethod(write, "options", options)
+write
--- End diff --

that's fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-10-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r81601610
  
--- Diff: R/pkg/R/generics.R ---
@@ -651,23 +651,25 @@ setGeneric("write.jdbc", function(x, url, tableName, 
mode = "error", ...) {
 
 #' @rdname write.json
 #' @export
-setGeneric("write.json", function(x, path) { standardGeneric("write.json") 
})
+setGeneric("write.json", function(x, path, mode = NULL, ...) { 
standardGeneric("write.json") })
--- End diff --

would it be better as more generic to have these generics as 
`setGeneric("write.json", function(x, path, ...)` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r81331357
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -328,6 +328,7 @@ setMethod("toDF", signature(x = "RDD"),
 #' It goes through the entire dataset once to determine the schema.
 #'
 #' @param path Path of file to read. A vector of multiple paths is allowed.
+#' @param ... additional external data source specific named properties.
--- End diff --

Hm, do you think it is okay as it is? I tried to make them look more 
consistent but it kind of failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r81071117
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -328,6 +328,7 @@ setMethod("toDF", signature(x = "RDD"),
 #' It goes through the entire dataset once to determine the schema.
 #'
 #' @param path Path of file to read. A vector of multiple paths is allowed.
+#' @param ... additional external data source specific named properties.
--- End diff --

Thank you for your advice. I will try to deal with this as far as I can!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r81070754
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -328,6 +328,7 @@ setMethod("toDF", signature(x = "RDD"),
 #' It goes through the entire dataset once to determine the schema.
 #'
 #' @param path Path of file to read. A vector of multiple paths is allowed.
+#' @param ... additional external data source specific named properties.
--- End diff --

right - when 2 functions share the same @rdname, they are documented on the 
same Rd page and CRAN check requirements are to have 1 and only 1 `@param ...` 
when either function has `...` as parameter.

I haven't check, but my guess is you need to add `@param ...` for `@rdname 
read.json` since it is new.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r80855269
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -55,6 +55,19 @@ setMethod("initialize", "SparkDataFrame", 
function(.Object, sdf, isCached) {
   .Object
 })
 
+#' Set options/mode and then return the write object
+#' @noRd
+setWriteOptions <- function(write, path = NULL, mode = 'error', ...) {
+options <- varargsToStrEnv(...)
+if (!is.null(path)) {
+  options[["path"]] <- path
+}
+jmode <- convertToJSaveMode(mode)
+write <- callJMethod(write, "mode", jmode)
+write <- callJMethod(write, "options", options)
+write
--- End diff --

Yes, I think that sounds good. Actually, I was thinking single common 
method for options in both `read`/`write` (maybe in `utils.R`?) and two common 
method for reading/writing in `read`/`write`. I am wondering maybe if you think 
it is okay for me to try this in another PR after this 
one/https://github.com/apache/spark/pull/15231


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r80850317
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -328,6 +328,7 @@ setMethod("toDF", signature(x = "RDD"),
 #' It goes through the entire dataset once to determine the schema.
 #'
 #' @param path Path of file to read. A vector of multiple paths is allowed.
+#' @param ... additional external data source specific named properties.
--- End diff --

Yes, it was not. I am not very sure on this (as actually I am not used to 
this CRAN check). My guess is, it seems they combine the arguments? For 
example, Parquet API is as below:

```r
 #' @param path path of file to read. A vector of multiple paths is allowed.
 #' @return SparkDataFrame
 #' @rdname read.parquet
...
read.parquet.default <- function(path, ...) {
```

```r
 #' @param ... argument(s) passed to the method.
 #' @rdname read.parquet
...
 parquetFile.default <- function(...) {
```

It complained about duplicated `@params` when I tried to add `@param ...` 
to `read.parquet.default`. So, I ended up with removing this back.

On the other hand, for JSON APIs,

```r
 #' @rdname read.json
 #' @name jsonFile
...
 jsonFile.default <- function(path) {
```

```r
#' @param path Path of file to read. A vector of multiple paths is allowed.
#' @param ... additional external data source specific named properties.
#' @return SparkDataFrame
#' @rdname read.json
...
read.json.default <- function(path, ...) {
```

It seems `jsonFile` does not describe `@param`. So, I think it passed.

If you meant another problem, could you please guide me?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r80841590
  
--- Diff: R/pkg/R/utils.R ---
@@ -342,7 +342,8 @@ varargsToStrEnv <- function(...) {
   for (name in names(pairs)) {
 value <- pairs[[name]]
 if (!(is.logical(value) || is.numeric(value) || is.character(value) || 
is.null(value))) {
-  stop("value[", value, "] in key[", name, "] is not convertable to 
string.")
+  stop(paste0("Unsupported type for ", name, " : ", class(value),
+   ". Supported types are logical, numeric, character and null."))
--- End diff --

`NULL` instead of `null`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r80841566
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -328,6 +328,7 @@ setMethod("toDF", signature(x = "RDD"),
 #' It goes through the entire dataset once to determine the schema.
 #'
 #' @param path Path of file to read. A vector of multiple paths is allowed.
+#' @param ... additional external data source specific named properties.
--- End diff --

this is odd - was it not complaining about this missing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r80841401
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -55,6 +55,19 @@ setMethod("initialize", "SparkDataFrame", 
function(.Object, sdf, isCached) {
   .Object
 })
 
+#' Set options/mode and then return the write object
+#' @noRd
+setWriteOptions <- function(write, path = NULL, mode = 'error', ...) {
+options <- varargsToStrEnv(...)
+if (!is.null(path)) {
+  options[["path"]] <- path
+}
+jmode <- convertToJSaveMode(mode)
+write <- callJMethod(write, "mode", jmode)
+write <- callJMethod(write, "options", options)
+write
--- End diff --

do you think if it make sense to have a generic write method? ie. include 
`write <- callJMethod(x@sdf, "write")` and `invisible(callJMethod(write, 
source, path))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15239#discussion_r80841143
  
--- Diff: R/pkg/R/utils.R ---
@@ -334,6 +334,27 @@ varargsToEnv <- function(...) {
   env
 }
 
+# Utility function to capture the varargs into environment object but all 
values are converted
+# into string.
+varargsToStrEnv <- function(...) {
+  pairs <- list(...)
+  env <- new.env()
+  for (name in names(pairs)) {
+value <- pairs[[name]]
+if (!(is.logical(value) || is.numeric(value) || is.character(value) || 
is.null(value))) {
+  stop("value[", value, "] in key[", name, "] is not convertable to 
string.")
--- End diff --

I think "... logical, character, numeric and NULL" as these are the R names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15239: [SPARK-17665][SPARKR] Support options/mode all fo...

2016-09-26 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/15239

[SPARK-17665][SPARKR] Support options/mode all for read/write APIs and 
options in other types

## What changes were proposed in this pull request?

This PR includes the changes below:

  - Support `mode`/`options` in `read.parquet`, `write.parquet`, 
`read.orc`, `write.orc`, `read.text`, `write.text`, `read.json` and 
`write.json` APIs

  - Support other types (logical, numeric and string) as options for 
`write.df`, `read.df`, `read.parquet`, `write.parquet`, `read.orc`, 
`write.orc`, `read.text`, `write.text`, `read.json` and `write.json`


## How was this patch tested?

Unit tests in `test_sparkSQL.R`/ `utils.R`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17665

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15239


commit 7692fbc3cbc1ab808b74321825dac3b90b8d5067
Author: hyukjinkwon 
Date:   2016-09-26T06:03:33Z

Support other types for options

commit a9ffac85b091e8e4f88a6cdec5e8c462c7bf9125
Author: hyukjinkwon 
Date:   2016-09-26T07:23:40Z

Add mode and options for read/write.parquet read/write.json read/write.orc 
read/write.text

commit 924ec77e6e57386859ac980eb829543019fb02ba
Author: hyukjinkwon 
Date:   2016-09-26T07:31:21Z

Fix indentation

commit a9f9df856ee9a403634c017b1de6dddf013e5623
Author: hyukjinkwon 
Date:   2016-09-26T08:05:40Z

Fix utils and add the tests for utils




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org