[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 thanks for the review everyone! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82745/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82745/testReport)** for PR 18979 at commit [`c0e81a1`](https://github.com/apache/spark/commit/c0e81a1c87011efdc010f1c9ba28dde003458667). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82745/testReport)** for PR 18979 at commit [`c0e81a1`](https://github.com/apache/spark/commit/c0e81a1c87011efdc010f1c9ba28dde003458667). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 done. Not writing 0-byte files will offer significant speedup against object stores, where the cost of a call to getFileStatus() can take hundreds of millis. I look forward to it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 Could you resolve the conflicts again? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)** for PR 18979 at commit [`649f8da`](https://github.com/apache/spark/commit/649f8da245443567b842b697a4d47e5241eb5946). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82731/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82732/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)** for PR 18979 at commit [`d3f96f6`](https://github.com/apache/spark/commit/d3f96f63f263f134a79e76e8bdd1961f333b7c7a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 Hi, @steveloughran . > is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downstream fail)? Yes, So far, Spark leave an empty directory in case of ORC. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82730/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)** for PR 18979 at commit [`adab985`](https://github.com/apache/spark/commit/adab985d0455f3f549ebceb03528f0a6a45a31c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge conflict in the Insert suite I've rebased against master. 1. Everything handles missing files on output 2. There's only one logInfo at the end of the execute call, so if many empty files are created, the logs aren't too noisy. 3. There is now some implicit counting of how many files were missing `= submittedFiles - numFiles`; this isn't aggregated and reported though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)** for PR 18979 at commit [`d3f96f6`](https://github.com/apache/spark/commit/d3f96f63f263f134a79e76e8bdd1961f333b7c7a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)** for PR 18979 at commit [`649f8da`](https://github.com/apache/spark/commit/649f8da245443567b842b697a4d47e5241eb5946). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)** for PR 18979 at commit [`adab985`](https://github.com/apache/spark/commit/adab985d0455f3f549ebceb03528f0a6a45a31c0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Noted :) @dongjoon-hyun : is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downstream fail)? If so, the warning message which @gatorsmile has proposed is potentially going to mislead people into worrying about a problem which isn't there. and the numFiles metric is going to mislead. I'm starting to worry about how noisy the log would be, both there and when working with s3 when it's playing delayed visibility (rarer). 1. What if this patch just logged at debug: less noise, but still something there if people are trying to debug a mismatch? 1. if there's no file found, numFiles doesn't get incremented. 1. I count the number of files actually submitted 1. And in `getFinalStats()` log @ info if there is a mismatch This would line things up in future for actually returning the list of expected vs actual files up as a metric where it could be reported. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 Gentle ping, @steveloughran ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 Could you also include the [test cases](https://github.com/dongjoon-hyun/spark/blob/b545f281b19120cc2c9e4197cae4b1315969247d/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala#L2054-L2060) to [InsertSuite.scala](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala) ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18979 +1. This solves the regression on writing emtpy dataset with ORC format, too! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 LGTM except a minor comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 @viirya : the new data writer API will allow for a broader set of stats to be propagated back from workers. When you are working with the object stores, an useful stat to get back is throttle count & retry count as they can be the cause of why things are slow ... and if it is due to throttling, throwing more workers at the job will actually slow things down. They'd be the ones to look at first --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18979 I don't have strong opinion against this. Incorrect size is an issue but I can't think a better solution for now... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 Will review it tomorrow --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user adrian-ionescu commented on the issue: https://github.com/apache/spark/pull/18979 To me, this looks good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Has anyone had a look at this recently? The problem still exists, and while downstream filesystems can address if they recognise the use case & lie about values, they will be returning invalid values to the caller: spark will be reporting the wrong values. At least with this PR Spark will get to make the decisions about how to react itself. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Related to this, updated spec on [Hadoop output stream, Syncable and StreamCapabilities](https://github.com/steveloughran/hadoop/blob/s3/HADOOP-13327-outputstream-trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md). As the doc notes, object stores != filesystems, and while a lot can be done to preserve the metaphor on input, its on output where CRUD inconsistencies surface. along with the logic as "does a 0-byte file get created in create()", "when is data written?", etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80841/testReport)** for PR 18979 at commit [`f778213`](https://github.com/apache/spark/commit/f778213d3adee1f2d6c977f92c093930f3d6c013). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80841/testReport)** for PR 18979 at commit [`f778213`](https://github.com/apache/spark/commit/f778213d3adee1f2d6c977f92c093930f3d6c013). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 @adrian-ionescu wrote > is there a need for calling getFinalStats() more than once? No. As long as everyone is aware of it, it won't be an issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 > To mimic S3-like behavior, you can overwrite the file system spark.hadoop.fs.$scheme.impl" @gatorsmile: you will be able to do something better soon, as S3A is adding an inconsistent AWS client into `hadoop-aws` JAR, which you can then enable to guarantee consistency delays and inject intermittent faults into the system (throttling, transient network events). All it will take is a config option to switch to this client, plus the chaos-monkey-esque probabilities and delays. This is what I'm already using âyou will be able to as well. That is, no need to switch clients, just go`spark.hadoop.fs.s3a.s3.client.factory.impl=org.apache.hadoop.fs.s3a.InconsistentS3ClientFactory` and wait for the stack traces. The S3A FS itself [needs to do more](https://issues.apache.org/jira/browse/HADOOP-14531) to handle throttling & failures (retry, add failure metrics so throttling & error rates can be measured). Knowing throttling rates is important as it will help identify perf problems due to bad distribution of work across a bucket, excess use of KMS key lookup..., things that in surface in support calls. This patch restores Spark 2.3 to the behaviour it has in Spark 2.2: a brief delay between object creation and visibility does not cause the task to fail --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Currently *nobody should be using s3a:// at the the temp file destination*, which is the same as saying "nobody should be using s3a:// as the direct destination of work", not without a special committer (Netflix, IBM's stocator, ...) or without something to give S3 list consistency. Because today, task commit relies on a list & rename of all files in the task attempt dir, and if you don't get list consistency, you can miss out on files. If you ever hear anyone complaining "it takes too long to commit to s3" then they are using it this way. Tell them to use a consistency layer or to stop it :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18979 Btw, as the file path passed to state tracker should be task temp file, is it common to directly use S3 as temp file output destination? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user adrian-ionescu commented on the issue: https://github.com/apache/spark/pull/18979 Thanks for the fix and tests, @steveloughran! Re 1. -- is there a need for calling `getFinalStats()` more than once? The function doc clearly states that it's not supported and may lead to undefined behaviour. Could be fixed, of course, but depending on the implementation of the stats tracker, that can be at the expense of additional memory or code complexity.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80803/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80803/testReport)** for PR 18979 at commit [`2a113fd`](https://github.com/apache/spark/commit/2a113fde1653743a3543df8ada395f320b826a3e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18979 To mimic S3-like behavior, you can overwrite the file system `spark.hadoop.fs.$scheme.impl` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18979 cc @adrian-ionescu --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #80803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80803/testReport)** for PR 18979 at commit [`2a113fd`](https://github.com/apache/spark/commit/2a113fde1653743a3543df8ada395f320b826a3e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org