[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15361 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15361 @HyukjinKwon Please, do! Thanks a lot for helping here (: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 I think the recent related codes were committed by @rxin. Do you mind if I ask to take a look please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15361 @HyukjinKwon May be we can reach someone else with commit bit? Do you know anyone to ping? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 ping .. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 (gentle ping @chenghao-intel @davies) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 @chenghao-intel @davies Would there be other things maybe I should test or take care of? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/15361 yes, please go ahead. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 Hi @chenghao-intel and @davies, it seems related code paths were updated by your before. Do you mind if I ask to take a look please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15361 @HyukjinKwon It works great! Thank you! My mistake was by applying changes for the same `wrapperFor` method, while for 2.0.0 sources state it have to be placed in `wrap` method instead with a bit modification to pass third argument in recursive call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 @kxepal Sure, thanks for confirming! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15361 @HyukjinKwon Oh, great news! It seems it's me backported this patch to 2.0.0 incorrectly. I'm sorry for false alarm then - suddenly, I wasn't able to test it with master. I'll do one more try today, but so far it looks like that you solved the problem \o/ Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 Hi @kxepal , I just tested (copied and pasted) the codes below: ```scala import org.apache.spark.sql.SparkSession import spark.implicits._ val spark = SparkSession.builder().appName("Spark Hive Example").enableHiveSupport().getOrCreate() val sv = org.apache.spark.mllib.linalg.Vectors.sparse(7, Array(0, 42), Array(-127, 128)) val df = Seq(("thing", sv)).toDF("thing", "vector") df.write.format("orc").save("/tmp/thing.orc") ``` and it seems fine with the current master branch. Do you mind if I try to verify this again when be hopefully backport to branch-2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15361 @HyukjinKwon Thank you a lot! Staying tuned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 I will test this and fix it here together as well if there are some more cases to handle. Thanks for verifying this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15361 @HyukjinKwon Ok, try something like this: ``` scala> val sv = org.apache.spark.mllib.linalg.Vectors.sparse(7, Array(0, 42), Array(-127, 128)) sv: org.apache.spark.mllib.linalg.Vector = (7,[0,42],[-127.0,128.0]) scala> val df = Seq(("thing", sv)).toDF("thing", "vector") df: org.apache.spark.sql.DataFrame = [thing: string, vector: vector] scala> df.write.format("orc").save("/tmp/thing.orc") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 @kxepal, Sure, I will definitely try to reproduce as soon as you do. Meanwhile, let me double check this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15361 @HyukjinKwon Thanks for the patch, but suddenly it doesn't solves the issue. Tested with 2.0.0 Spark: ``` Caused by: java.lang.ClassCastException: org.apache.spark.mllib.linalg.VectorUDT cannot be cast to org.apache.spark.sql.types.StructType at org.apache.spark.sql.hive.HiveInspectors$class.wrap(HiveInspectors.scala:558) at org.apache.spark.sql.hive.orc.OrcSerializer.wrap(OrcFileFormat.scala:164) at org.apache.spark.sql.hive.orc.OrcSerializer.wrapOrcStruct(OrcFileFormat.scala:202) at org.apache.spark.sql.hive.orc.OrcSerializer.serialize(OrcFileFormat.scala:168) at org.apache.spark.sql.hive.orc.OrcOutputWriter.writeInternal(OrcFileFormat.scala:253) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:255) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1325) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258) ``` Let me try to make simple scala test case that reproduces the issue from shell. May be this will be more helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15361 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15361 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66386/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15361 **[Test build #66386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66386/consoleFull)** for PR 15361 at commit [`948a5ca`](https://github.com/apache/spark/commit/948a5ca6204460bda0eeefc85ca326c626a707f8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15361 **[Test build #66386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66386/consoleFull)** for PR 15361 at commit [`948a5ca`](https://github.com/apache/spark/commit/948a5ca6204460bda0eeefc85ca326c626a707f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 @yhuai and @liancheng Do you mind if I ask to review this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org