[GitHub] spark pull request #22031: [TODO][SPARK-23932][SQL] Higher order function zi...

2018-08-07 Thread crafty-coder
Github user crafty-coder commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r208387111
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -442,3 +442,93 @@ case class ArrayAggregate(
 
   override def prettyName: String = "aggregate"
 }
+
+/**
+ * Transform elements in an array using the transform function. This is 
similar to
+ * a `map` in functional programming.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, func) - Merges the two given arrays, element-wise, 
into a single array using function. If one array is shorter, nulls are appended 
at the end to match the length of the longer array, before applying function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), x -> x + 1);
--- End diff --

The examples are not accurate.

You could something like:

```
 > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x));   

  array(('a', 1), ('b', 3), ('c', 5))   

 > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y));   

  array(4, 6)   

 > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> 
concat(x, y));   
  array('ad', 'be', 'cf')   

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...

2018-07-25 Thread crafty-coder
Github user crafty-coder commented on the issue:

https://github.com/apache/spark/pull/20949
  
@HyukjinKwon and @MaxGekk thanks for your help in this PR!

My JIRA Id is also **crafty-coder**



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

2018-07-18 Thread crafty-coder
Github user crafty-coder commented on a diff in the pull request:

https://github.com/apache/spark/pull/20949#discussion_r203306174
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -512,6 +513,43 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 }
   }
 
+  test("SPARK-19018: Save csv with custom charset") {
+
+// scalastyle:off nonascii
+val content = "µß áâä ÁÂÄ"
+// scalastyle:on nonascii
+
+Seq("iso-8859-1", "utf-8", "utf-16", "utf-32", "windows-1250").foreach 
{ encoding =>
+  withTempDir { dir =>
+val csvDir = new File(dir, "csv")
+
+val originalDF = Seq(content).toDF("_c0").repartition(1)
+originalDF.write
+  .option("encoding", encoding)
+  .csv(csvDir.getCanonicalPath)
+
+csvDir.listFiles().filter(_.getName.endsWith("csv")).foreach({ 
csvFile =>
--- End diff --

What do you mean?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

2018-07-18 Thread crafty-coder
Github user crafty-coder commented on a diff in the pull request:

https://github.com/apache/spark/pull/20949#discussion_r203286908
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -512,6 +513,43 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 }
   }
 
+  test("SPARK-19018: Save csv with custom charset") {
+
+// scalastyle:off nonascii
+val content = "µß áâä ÁÂÄ"
+// scalastyle:on nonascii
+
+Seq("iso-8859-1", "utf-8", "utf-16", "utf-32", "windows-1250").foreach 
{ encoding =>
+  withTempDir { dir =>
+val csvDir = new File(dir, "csv")
+
+val originalDF = Seq(content).toDF("_c0").repartition(1)
+originalDF.write
+  .option("encoding", encoding)
+  .csv(csvDir.getCanonicalPath)
+
+csvDir.listFiles().filter(_.getName.endsWith("csv")).foreach({ 
csvFile =>
+  val readback = Files.readAllBytes(csvFile.toPath)
+  val expected = (content + 
"\n").getBytes(Charset.forName(encoding))
--- End diff --

Good Point!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

2018-07-17 Thread crafty-coder
Github user crafty-coder commented on a diff in the pull request:

https://github.com/apache/spark/pull/20949#discussion_r203023263
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -512,6 +512,43 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 }
   }
 
+  test("Save csv with custom charset") {
+Seq("iso-8859-1", "utf-8", "windows-1250").foreach { encoding =>
+  withTempDir { dir =>
+val csvDir = new File(dir, "csv").getCanonicalPath
+// scalastyle:off
+val originalDF = Seq("µß áâä ÁÂÄ").toDF("_c0")
+// scalastyle:on
+originalDF.write
+  .option("header", "false")
--- End diff --

My bad, there is no reason. It's fixed on the next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21247: [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 ...

2018-06-22 Thread crafty-coder
Github user crafty-coder commented on the issue:

https://github.com/apache/spark/pull/21247
  
Hi @holdenk , I'm working on a similar PR 
(https://github.com/apache/spark/pull/20949) to allow setting up the encoding 
when writing csv files.

It would be strange if you could set up the encoding when saving a json 
file and you can't do the same when saving a csv.

It would be nice if you could take a look! 

Thanks you 🐱 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...

2018-05-06 Thread crafty-coder
Github user crafty-coder commented on the issue:

https://github.com/apache/spark/pull/20949
  
I would say this change has value on its own. 

At the moment the csv reader applies the charset config but the csv writer 
is ignoring it, which I think its a bit confusing.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

2018-03-30 Thread crafty-coder
GitHub user crafty-coder opened a pull request:

https://github.com/apache/spark/pull/20949

[SPARK-19018][SQL] Add support for custom encoding on csv writer

## What changes were proposed in this pull request?

Add support for custom encoding on csv writer, see 
https://issues.apache.org/jira/browse/SPARK-19018

## How was this patch tested?

Added two unit tests in CSVSuite


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/crafty-coder/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20949


commit b9a7bf03b312da151e1d7e37338092bbf5bcb38a
Author: crafty-coder <carlospb86@...>
Date:   2018-03-30T19:35:04Z

[SPARK-19018][SQL] Add support for custom encoding on csv writer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org