[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16089 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69519/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69519/consoleFull)** for PR 16089 at commit [`27c102d`](https://github.com/apache/spark/commit/27c102deb1701fe62f776fe4da61dac959270b73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69519/consoleFull)** for PR 16089 at commit [`27c102d`](https://github.com/apache/spark/commit/27c102deb1701fe62f776fe4da61dac959270b73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16089 @srowen yea the hadoop format api is pretty awkward to use, and actually makes everything more complicated than needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69488/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69488/consoleFull)** for PR 16089 at commit [`5707218`](https://github.com/apache/spark/commit/57072180a75a06104f0da5d1a544eac5a7e916a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16089 I ask about committers as I'm staring at the V1 and V2 committer APIs right now related to S3 destinations; not directly related to this though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 @steveloughran Spark is handling the output committing somewhere further up the stack. The path being passed in to `OutputWriterFactory.newInstance` is to a temporary file, such as `/private/var/folders/sq/vmncyd7506q_ch43llrwr8sn6zfknl/T/spark-3db2844b-1f3c-45c2-8bf4-8a3c81440e38/_temporary/0/_temporary/attempt_20161201081833__m_00_0/part-0-8dd44cea-c01e-4bfe-ab03-641ebce18afb.txt`. I'll make a pass through the existing tests to see if anything obvious is missing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16089 AFAIK, the big thing the FileOutputFormat really adds is not the compression, but the output committer and the stuff to go with that (working directories, paths, etc etc). If you aren't going near that, and just want a fast write of .csv and jackson with optional compression, well, I don't see anything in the code I'd run away from. If you do want to think about how to write CSV files during the output of speculative work in the presence of failures, well, that's where the mapred.lib.output code really comes out to play. Otherwise, in general PR review mode: Tests? What if the code asks for a committer that isn't there, passes in null sequences in rows to write, tries to hit the buffer corner cases. Hopefully those exist already, but if not, now is a good time to try to break things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69488/consoleFull)** for PR 16089 at commit [`5707218`](https://github.com/apache/spark/commit/57072180a75a06104f0da5d1a544eac5a7e916a7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16089 I was going to say, hm, are we sure we want to reimplement / go around the Hadoop support for this? but in practice it looks like it actually simplifies some things. At the moment I can't think of any particular behaviors we're missing by avoiding the Input/OutputFormat. But CC @vanzin @steveloughran for any comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69457/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69457 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69457/consoleFull)** for PR 16089 at commit [`56667bd`](https://github.com/apache/spark/commit/56667bd86c1dbb52fb47134042e5a529241a0637). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69457/consoleFull)** for PR 16089 at commit [`56667bd`](https://github.com/apache/spark/commit/56667bd86c1dbb52fb47134042e5a529241a0637). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 Doh, forgot to run the Hive tests. Should be fixed now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69449/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69449/consoleFull)** for PR 16089 at commit [`298e507`](https://github.com/apache/spark/commit/298e507d5c42328de610d6109afb11076aadfb96). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16089 **[Test build #69449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69449/consoleFull)** for PR 16089 at commit [`298e507`](https://github.com/apache/spark/commit/298e507d5c42328de610d6109afb11076aadfb96). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/16089 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16089 Yea then this is definitely fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 Yep. It uses the Hadoop `FileSystem` class to open files, just like `TextOutputFormat` does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16089 Does this work against file systems with HDFS API (not local posix)? If yes, sounds good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 This touches a fair number of components. I also haven't done any performance testing to see what the impact of this is. Curious what your thoughts are? cc/ @marmbrus @rxin @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16089 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org