[GitHub] spark pull request #22031: [TODO][SPARK-23932][SQL] Higher order function zi...
Github user crafty-coder commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r208387111 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,93 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Transform elements in an array using the transform function. This is similar to + * a `map` in functional programming. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), x -> x + 1); --- End diff -- The examples are not accurate. You could something like: ``` > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x)); array(('a', 1), ('b', 3), ('c', 5)) > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y)); array(4, 6) > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)); array('ad', 'be', 'cf') ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...
Github user crafty-coder commented on the issue: https://github.com/apache/spark/pull/20949 @HyukjinKwon and @MaxGekk thanks for your help in this PR! My JIRA Id is also **crafty-coder** --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...
Github user crafty-coder commented on a diff in the pull request: https://github.com/apache/spark/pull/20949#discussion_r203306174 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -512,6 +513,43 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te } } + test("SPARK-19018: Save csv with custom charset") { + +// scalastyle:off nonascii +val content = "µà áâä ÃÃÃ" +// scalastyle:on nonascii + +Seq("iso-8859-1", "utf-8", "utf-16", "utf-32", "windows-1250").foreach { encoding => + withTempDir { dir => +val csvDir = new File(dir, "csv") + +val originalDF = Seq(content).toDF("_c0").repartition(1) +originalDF.write + .option("encoding", encoding) + .csv(csvDir.getCanonicalPath) + +csvDir.listFiles().filter(_.getName.endsWith("csv")).foreach({ csvFile => --- End diff -- What do you mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...
Github user crafty-coder commented on a diff in the pull request: https://github.com/apache/spark/pull/20949#discussion_r203286908 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -512,6 +513,43 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te } } + test("SPARK-19018: Save csv with custom charset") { + +// scalastyle:off nonascii +val content = "µà áâä ÃÃÃ" +// scalastyle:on nonascii + +Seq("iso-8859-1", "utf-8", "utf-16", "utf-32", "windows-1250").foreach { encoding => + withTempDir { dir => +val csvDir = new File(dir, "csv") + +val originalDF = Seq(content).toDF("_c0").repartition(1) +originalDF.write + .option("encoding", encoding) + .csv(csvDir.getCanonicalPath) + +csvDir.listFiles().filter(_.getName.endsWith("csv")).foreach({ csvFile => + val readback = Files.readAllBytes(csvFile.toPath) + val expected = (content + "\n").getBytes(Charset.forName(encoding)) --- End diff -- Good Point! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...
Github user crafty-coder commented on a diff in the pull request: https://github.com/apache/spark/pull/20949#discussion_r203023263 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -512,6 +512,43 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te } } + test("Save csv with custom charset") { +Seq("iso-8859-1", "utf-8", "windows-1250").foreach { encoding => + withTempDir { dir => +val csvDir = new File(dir, "csv").getCanonicalPath +// scalastyle:off +val originalDF = Seq("µà áâä ÃÃÃ").toDF("_c0") +// scalastyle:on +originalDF.write + .option("header", "false") --- End diff -- My bad, there is no reason. It's fixed on the next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21247: [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 ...
Github user crafty-coder commented on the issue: https://github.com/apache/spark/pull/21247 Hi @holdenk , I'm working on a similar PR (https://github.com/apache/spark/pull/20949) to allow setting up the encoding when writing csv files. It would be strange if you could set up the encoding when saving a json file and you can't do the same when saving a csv. It would be nice if you could take a look! Thanks you ð± --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...
Github user crafty-coder commented on the issue: https://github.com/apache/spark/pull/20949 I would say this change has value on its own. At the moment the csv reader applies the charset config but the csv writer is ignoring it, which I think its a bit confusing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...
GitHub user crafty-coder opened a pull request: https://github.com/apache/spark/pull/20949 [SPARK-19018][SQL] Add support for custom encoding on csv writer ## What changes were proposed in this pull request? Add support for custom encoding on csv writer, see https://issues.apache.org/jira/browse/SPARK-19018 ## How was this patch tested? Added two unit tests in CSVSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/crafty-coder/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20949 commit b9a7bf03b312da151e1d7e37338092bbf5bcb38a Author: crafty-coder <carlospb86@...> Date: 2018-03-30T19:35:04Z [SPARK-19018][SQL] Add support for custom encoding on csv writer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org