[GitHub] spark issue #16307: [MINOR][BUILD] Fix lint-check failures and javadoc8 brea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16307 **[Test build #70241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70241/testReport)** for PR 16307 at commit [`6f724fa`](https://github.com/apache/spark/commit/6f724fa6d5894a0ee13c887f150a14375f5b4887). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16307: [MINOR][BUILD] Fix lint-check failures and javado...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16307#discussion_r92765013 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java --- @@ -252,7 +252,7 @@ public static long parseSecondNano(String secondNano) throws IllegalArgumentExce public final int months; public final long microseconds; - public final long milliseconds() { + public long milliseconds() { --- End diff -- > Final classes by definition cannot be extended so the final modifier on the method of a final class is redundant. (http://checkstyle.sourceforge.net/config_modifier.html#RedundantModifier) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16307: [MINOR][BUILD] Fix lint-check failures and javado...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16307 [MINOR][BUILD] Fix lint-check failures and javadoc8 break ## What changes were proposed in this pull request? This PR proposes to fix lint-check failures and javadoc8 break. Few errors were introduced as below: **lint-check failures** ``` [ERROR] src/test/java/org/apache/spark/network/TransportClientFactorySuite.java:[45,1] (imports) RedundantImport: Duplicate import to line 43 - org.apache.spark.network.util.MapConfigProvider. [ERROR] src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java:[255,10] (modifier) RedundantModifier: Redundant 'final' modifier. ``` **javadoc8** ``` [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:19: error: bad use of '>' [error] * "max" -> "2016-12-05T20:54:20.827Z" // maximum event time seen in this trigger [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:20: error: bad use of '>' [error] * "min" -> "2016-12-05T20:54:20.827Z" // minimum event time seen in this trigger [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:21: error: bad use of '>' [error] * "avg" -> "2016-12-05T20:54:20.827Z" // average event time seen in this trigger [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:22: error: bad use of '>' [error] * "watermark" -> "2016-12-05T20:54:20.827Z" // watermark used in this trigger [error] ``` ## How was this patch tested? Manually checked as below: **lint-check failures** ``` ./dev/lint-java Checkstyle checks passed. ``` **javadoc8** This seems hidden in the API doc but I manually checked after removing access modifier as below: It looks not rendering properly (scaladoc). ![2016-12-16 3 40 34](https://cloud.githubusercontent.com/assets/6477701/21255175/8df1fe6e-c3ad-11e6-8cda-ce7f76c6677a.png) After this PR, it renders as below: - scaladoc ![2016-12-16 3 40 23](https://cloud.githubusercontent.com/assets/6477701/21255135/4a11dab6-c3ad-11e6-8ab2-b091c4f45029.png) - javadoc ![2016-12-16 3 41 10](https://cloud.githubusercontent.com/assets/6477701/21255137/4bba1d9c-c3ad-11e6-9b88-62f1f697b56a.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark lint-javadoc8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16307.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16307 commit 6f724fa6d5894a0ee13c887f150a14375f5b4887 Author: hyukjinkwonDate: 2016-12-16T07:29:19Z Fix lint-check failures and javadoc8 break --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user ankurdave commented on the issue: https://github.com/apache/spark/pull/16271 Merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Disallow went time watermark delay thr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16304 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Disallow went time watermark delay thr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16304 **[Test build #70233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70233/testReport)** for PR 16304 at commit [`8c603f4`](https://github.com/apache/spark/commit/8c603f46deb7021c9c2b6297be532342d2d8bfab). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Disallow went time watermark delay thr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16304 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70233/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16271: [SPARK-18845][GraphX] PageRank has incorrect init...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16271 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16306: Branch 2.1
Github user qhhan closed the pull request at: https://github.com/apache/spark/pull/16306 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16306: Branch 2.1
GitHub user qhhan opened a pull request: https://github.com/apache/spark/pull/16306 Branch 2.1 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16306.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16306 commit 416bc3dd3db7f7ae2cc7b3ffe395decd0c5b73f9 Author: Zheng RuiFengDate: 2016-11-16T10:53:23Z [SPARK-18446][ML][DOCS] Add links to API docs for ML algos ## What changes were proposed in this pull request? Add links to API docs for ML algos ## How was this patch tested? Manual checking for the API links Author: Zheng RuiFeng Closes #15890 from zhengruifeng/algo_link. (cherry picked from commit a75e3fe923372c56bc1b2f4baeaaf5868ad28341) Signed-off-by: Sean Owen commit b0ae8712358fc8c07aa5efe4d0bd337e7e452078 Author: Xianyang Liu Date: 2016-11-16T11:59:00Z [SPARK-18420][BUILD] Fix the errors caused by lint check in Java Small fix, fix the errors caused by lint check in Java - Clear unused objects and `UnusedImports`. - Add comments around the method `finalize` of `NioBufferedFileInputStream`to turn off checkstyle. - Cut the line which is longer than 100 characters into two lines. Travis CI. ``` $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install $ dev/lint-java ``` Before: ``` Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[21,8] (imports) UnusedImports: Unused import - org.apache.commons.crypto.cipher.CryptoCipherFactory. [ERROR] src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java:[516,5] (modifier) RedundantModifier: Redundant 'public' modifier. [ERROR] src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java:[133] (coding) NoFinalizer: Avoid using finalizer method. [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeMapData.java:[71] (sizes) LineLength: Line is longer than 100 characters (found 113). [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java:[112] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/test/java/org/apache/spark/sql/catalyst/expressions/HiveHasherSuite.java:[31,17] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions. [ERROR]src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103). [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[22,8] (imports) UnusedImports: Unused import - org.apache.spark.ml.linalg.Vectors. [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[51] (regexp) RegexpSingleline: No trailing whitespace allowed. ``` After: ``` $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install $ dev/lint-java Using `mvn` from path: /home/travis/build/ConeyLiu/spark/build/apache-maven-3.3.9/bin/mvn Checkstyle checks passed. ``` Author: Xianyang Liu Closes #15865 from ConeyLiu/master. (cherry picked from commit 7569cf6cb85bda7d0e76d3e75e286d4796e77e08) Signed-off-by: Sean Owen commit c0dbe08d604dea543eb17ccb802a8a20d6c21a69 Author: gatorsmile Date: 2016-11-16T16:25:15Z [SPARK-18415][SQL] Weird Plan Output when CTE used in RunnableCommand ### What changes were proposed in this pull request? Currently, when CTE is used in RunnableCommand, the Analyzer does not replace the logical node `With`. The child plan of RunnableCommand is not resolved. Thus, the output of the `With` plan node looks very confusing. For example, ``` sql( """ |CREATE VIEW cte_view AS |WITH w AS (SELECT 1 AS n), cte1 (select 2), cte2 as (select 3) |SELECT n FROM w """.stripMargin).explain() ``` The output is like ```
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user ankurdave commented on the issue: https://github.com/apache/spark/pull/16271 Thanks @aray for the explanation. I agree with @srowen - this looks reasonable to me. I'm going to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16305: [SPARK-18895][TESTS] Fix resource-closing-related and pa...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16305 cc @srowen, Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16282 **[Test build #70240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70240/testReport)** for PR 16282 at commit [`7b75dd6`](https://github.com/apache/spark/commit/7b75dd6a6213441a51d6c58b230d4e52097595ea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16305: [SPARK-18895][TESTS] Fix resource-closing-related...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16305#discussion_r92762998 --- Diff: core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala --- @@ -231,8 +236,10 @@ private[deploy] object RPackageUtils extends Logging { val zipOutputStream = new ZipOutputStream(new FileOutputStream(zipFile, false)) try { filesToBundle.foreach { file => -// get the relative paths for proper naming in the zip file -val relPath = file.getAbsolutePath.replaceFirst(dir.getAbsolutePath, "") +// Get the relative paths for proper naming in the zip file. Note that +// the separator should always be / for according to ZIP specification. +// `relPath` here should be, for example, "/packageTest/def.R" or "/test.R". +val relPath = file.toURI.toString.replaceFirst(dir.toURI.toString.stripSuffix("/"), "") --- End diff -- cc @shivaram, could I please ask to take a look? This fixes the test, `SparkR zipping works properly` on Windows in `RPackageUtilsSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16305: [SPARK-18895][TESTS] Fix resource-closing-related and pa...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16305 Build started: [TESTS] `org.apache.spark.scheduler.EventLoggingListenerSuite` [![PR-16305](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=4A39F6AA-CAFD-4BFD-8CB9-5BCAB193BC77=true)](https://ci.appveyor.com/project/spark-test/spark/branch/4A39F6AA-CAFD-4BFD-8CB9-5BCAB193BC77) Build started: [TESTS] `org.apache.spark.scheduler.ReplayListenerSuite` [![PR-16305](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=D89FD4C9-3196-4C28-A9CA-36A6F4B467AD=true)](https://ci.appveyor.com/project/spark-test/spark/branch/D89FD4C9-3196-4C28-A9CA-36A6F4B467AD) Build started: [TESTS] `org.apache.spark.metrics.InputOutputMetricsSuite` [![PR-16305](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=E40C21AC-56C7-4DFC-9E62-2C6CFEC98172=true)](https://ci.appveyor.com/project/spark-test/spark/branch/E40C21AC-56C7-4DFC-9E62-2C6CFEC98172) Build started: [TESTS] `org.apache.spark.deploy.RPackageUtilsSuite` [![PR-16305](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=08CE93A8-BCB1-47D9-8683-755109827A62=true)](https://ci.appveyor.com/project/spark-test/spark/branch/08CE93A8-BCB1-47D9-8683-755109827A62) Diff: https://github.com/apache/spark/compare/master...spark-test:08CE93A8-BCB1-47D9-8683-755109827A62 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16305: [SPARK-18895][TESTS] Fix resource-closing-related and pa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16305 **[Test build #70238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70238/testReport)** for PR 16305 at commit [`00d5322`](https://github.com/apache/spark/commit/00d5322e76f1e840afc2546abf7f99dc0dd6757d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16261: [SPARK-18836] [CORE] Serialize one copy of task metrics ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16261 **[Test build #70239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70239/testReport)** for PR 16261 at commit [`3a76605`](https://github.com/apache/spark/commit/3a76605520dba8b73ce8d3e01059ea21ea7259ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16261: [SPARK-18836] [CORE] Serialize one copy of task metrics ...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16261 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16305: [SPARK-18895][TESTS] Fix resource-closing-related...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16305#discussion_r92761800 --- Diff: core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala --- @@ -231,8 +236,10 @@ private[deploy] object RPackageUtils extends Logging { val zipOutputStream = new ZipOutputStream(new FileOutputStream(zipFile, false)) try { filesToBundle.foreach { file => -// get the relative paths for proper naming in the zip file -val relPath = file.getAbsolutePath.replaceFirst(dir.getAbsolutePath, "") +// Get the relative paths for proper naming in the zip file. Note that +// the separator should always be / for according to ZIP specification. +// `relPath` here should be, for example, "/packageTest/def.R" or "/test.R". +val relPath = file.toURI.toString.replaceFirst(dir.toURI.toString.stripSuffix("/"), "") --- End diff -- It should always be `/` according to ZIP specification (See `4.4.17 file name: (Variable)` in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16261: [SPARK-18836] [CORE] Serialize one copy of task m...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16261#discussion_r92761816 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -58,13 +60,17 @@ private[spark] abstract class Task[T]( val stageId: Int, val stageAttemptId: Int, val partitionId: Int, -// The default value is only used in tests. -val metrics: TaskMetrics = TaskMetrics.registered, @transient var localProperties: Properties = new Properties, +// The default value is only used in tests. +serializedTaskMetrics: Array[Byte] = --- End diff -- Its only used in the constructor right now, so I didn't use `val`. Is there a style convention that we usually follow ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16305: [SPARK-18895][TESTS] Fix resource-closing-related...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16305#discussion_r92761717 --- Diff: core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala --- @@ -176,26 +176,31 @@ private[deploy] object RPackageUtils extends Logging { val file = new File(Utils.resolveURI(jarPath)) if (file.exists()) { val jar = new JarFile(file) -if (checkManifestForR(jar)) { - print(s"$file contains R source code. Now installing package.", printStream, Level.INFO) - val rSource = extractRFolder(jar, printStream, verbose) - if (RUtils.rPackages.isEmpty) { -RUtils.rPackages = Some(Utils.createTempDir().getAbsolutePath) - } - try { -if (!rPackageBuilder(rSource, printStream, verbose, RUtils.rPackages.get)) { - print(s"ERROR: Failed to build R package in $file.", printStream) - print(RJarDoc, printStream) +Utils.tryWithSafeFinally { --- End diff -- Actual change is as below: ```scala Utils.tryWithSafeFinally { ... } { jar.close() } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16305: [SPARK-18895][TESTS] Fix resource-closing-related...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16305 [SPARK-18895][TESTS] Fix resource-closing-related and path-related test failures in identified ones on Windows ## What changes were proposed in this pull request? There are several tests failing due to resource-closing-related and path-related problems on Windows as below. - `RPackageUtilsSuite`: ``` - build an R package from a jar end to end *** FAILED *** (1 second, 625 milliseconds) java.io.IOException: Unable to delete file: C:\projects\spark\target\tmp\1481729427517-0\a\dep2\d\dep2-d.jar at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) - faulty R package shows documentation *** FAILED *** (359 milliseconds) java.io.IOException: Unable to delete file: C:\projects\spark\target\tmp\1481729428970-0\dep1-c.jar at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) - SparkR zipping works properly *** FAILED *** (47 milliseconds) java.util.regex.PatternSyntaxException: Unknown character property name {r} near index 4 C:\projects\spark\target\tmp\1481729429282-0 ^ at java.util.regex.Pattern.error(Pattern.java:1955) at java.util.regex.Pattern.charPropertyNodeFor(Pattern.java:2781) ``` - `InputOutputMetricsSuite`: ``` - input metrics for old hadoop with coalesce *** FAILED *** (240 milliseconds) java.io.IOException: Not a file: file:/C:/projects/spark/core/ignored at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) - input metrics with cache and coalesce *** FAILED *** (109 milliseconds) java.io.IOException: Not a file: file:/C:/projects/spark/core/ignored at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) - input metrics for new Hadoop API with coalesce *** FAILED *** (0 milliseconds) java.lang.IllegalArgumentException: Wrong FS: file://C:\projects\spark\target\tmp\spark-9366ec94-dac7-4a5c-a74b-3e7594a692ab\test\InputOutputMetricsSuite.txt, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:462) at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:114) - input metrics when reading text file *** FAILED *** (110 milliseconds) java.io.IOException: Not a file: file:/C:/projects/spark/core/ignored at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) - input metrics on records read - simple *** FAILED *** (125 milliseconds) java.io.IOException: Not a file: file:/C:/projects/spark/core/ignored at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) - input metrics on records read - more stages *** FAILED *** (110 milliseconds) java.io.IOException: Not a file: file:/C:/projects/spark/core/ignored at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) - input metrics on records - New Hadoop API *** FAILED *** (16 milliseconds) java.lang.IllegalArgumentException: Wrong FS: file://C:\projects\spark\target\tmp\spark-3f10a1a4-7820-4772-b821-25fd7523bf6f\test\InputOutputMetricsSuite.txt, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:462) at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:114) - input metrics on records read with cache *** FAILED *** (93 milliseconds) java.io.IOException: Not a file: file:/C:/projects/spark/core/ignored at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277)
[GitHub] spark issue #16261: [SPARK-18836] [CORE] Serialize one copy of task metrics ...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16261 With the help of @kayousterhout I ran a scheduling microbenchmark (Code [1]) with 1 tasks per stage on 20 m2.4xlarge machines on EC2 (160 cores). From 10 trials, I measured the average time taken per stage. Before this PR (baseline): 2526.81 ms With this PR: 1741.99 ms So overall we get a 785ms improvement (~30%) in this case. To figure out more closely where the speedup was coming from I added a timer inside the function `Task.serializeWithDependencies`[2]. Avg. Time to serialize one task without this PR: 0.119954 ms Avg. Time to serialize one task with this PR: 0.0556422 ms Thus we save around 0.064 ms in serialization time per task and that explains most of the improvements. [1] https://gist.github.com/shivaram/c84d18512fe8ba9c047e3d2b170b9f68 [2] https://github.com/apache/spark/blob/172a52f5d31337d90155feb7072381e8d5712288/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L224 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16290#discussion_r92761572 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -964,11 +970,17 @@ object StaticSQLConf { } } - val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir") -.doc("The default location for managed databases and tables.") + val DEFAULT_WAREHOUSE_PATH = buildConf("spark.sql.default.warehouse.dir") +.doc("The default location for managed databases and tables. " + + "Used if spark.sql.warehouse.dir is not set") .stringConf .createWithDefault(Utils.resolveURI("spark-warehouse").toString) + val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir") +.doc("The location for managed databases and tables.") --- End diff -- The description is not right. `spark.sql.warehouse.dir` is still the default location when we create a database/table without providing the location value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16233 cc @yhuai @cloud-fan Please have a look at this when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/16135 cc @rxin thanks for check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15211: [SPARK-14709][ML] spark.ml API for linear SVM
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15211 Remove WIP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16294: [SPARK-18669][SS][DOCS] Update Apache docs for St...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16294#discussion_r92757455 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -671,12 +678,114 @@ windowedCounts = words.groupBy( +### Handling Late Data and Watermarking Now consider what happens if one of the events arrives late to the application. For example, a word that was generated at 12:04 but it was received at 12:11. -Since this windowing is based on the time in the data, the time 12:04 should be considered for windowing. This occurs naturally in our window-based grouping â the late data is automatically placed in the proper windows and the correct aggregates are updated as illustrated below. +Since this windowing is based on the time in the data, the time 12:04 should be considered for +windowing. This occurs naturally in our window-based grouping â the late data is +automatically placed in the proper windows and the correct aggregates are updated as illustrated below. ![Handling Late Data](img/structured-streaming-late-data.png) +Furthermore, since Spark 2.1, you can define a watermark on the event time, and specify the threshold +on how late the date can be in terms of the event time. The engine will automatically track the +event time and drop any state that is related to old windows that are not expected to receive older +than (max event time seen - late threshold). This allows the engine to bound the size of the state +that is needed for calculating windowed aggregates. For example, we can apply watermarking to the +previous example as follows. + + + + +{% highlight scala %} +import spark.implicits._ --- End diff -- Just curious, will there be a complete example in the examples folder? In documents like ML, SQL, the code is cited from the example file instead of hard code in the document. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14868: [SPARK-16283][SQL] Implements percentile_approx aggregat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14868 Although the function returns the approximate percentile, the result is still deterministic. Is my understanding right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16304: [SPARK-18894][SS] Disallow went time watermark de...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92756861 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -572,6 +572,10 @@ class Dataset[T] private[sql]( val parsedDelay = Option(CalendarInterval.fromString("interval " + delayThreshold)) .getOrElse(throw new AnalysisException(s"Unable to parse time delay '$delayThreshold'")) +// Threshold specified in months/years is non-deterministic --- End diff -- sgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16300: [SPARK-18892][SQL] Alias percentile_approx approx...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16300 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16300 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16300 **[Test build #70237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70237/testReport)** for PR 16300 at commit [`01c6741`](https://github.com/apache/spark/commit/01c6741d42ca9390590ed9833f45dbe491c2826f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16300 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16300 updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16300 hm weird that we hard coded the name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16290 **[Test build #70236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70236/testReport)** for PR 16290 at commit [`014d7e1`](https://github.com/apache/spark/commit/014d7e1666e89940b66fb42e4cf0f93bfff455d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16300 Could we also update [another two more lines](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L64-L66)? ``` > SELECT _FUNC_(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT _FUNC_(10.0, 0.5, 100); 10.0 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16249 **[Test build #70235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70235/testReport)** for PR 16249 at commit [`0a40191`](https://github.com/apache/spark/commit/0a401914ff6240cc1f653eb0d873aefed028bfb2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70235/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70229/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16282 **[Test build #70229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70229/testReport)** for PR 16282 at commit [`5afb13d`](https://github.com/apache/spark/commit/5afb13d5fa4461bd57a748deb157aaa290d6155f). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838][WIP] Use separate executor service for eac...
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/16291 @vanzin - Thanks for taking a look and sorry about putting my unfinished sloppy code out there. I updated the bug and the PR with the overall design idea, which hopefully will answer your questions. >> Each listener should process events serially otherwise you risk getting into funny situations like a task end event being processed before the task start event for the same task. You are right, that's why the executor service is single threaded which guarantees ordered processing of the events per listener. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16301: [SPARK-18849][ML][SPARKR][DOC] vignettes final check reo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16301 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16301: [SPARK-18849][ML][SPARKR][DOC] vignettes final check reo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16301 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70234/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16301: [SPARK-18849][ML][SPARKR][DOC] vignettes final check reo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16301 **[Test build #70234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70234/testReport)** for PR 16301 at commit [`2f17e1e`](https://github.com/apache/spark/commit/2f17e1e00d19731c241e2e77a522d9dd849073e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16300 **[Test build #3502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3502/testReport)** for PR 16300 at commit [`01dded0`](https://github.com/apache/spark/commit/01dded0dc36a3ea1f178215d31e8659bafdb56ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16304: [SPARK-18894][SS] Disallow went time watermark de...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92754420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -572,6 +572,10 @@ class Dataset[T] private[sql]( val parsedDelay = Option(CalendarInterval.fromString("interval " + delayThreshold)) .getOrElse(throw new AnalysisException(s"Unable to parse time delay '$delayThreshold'")) +// Threshold specified in months/years is non-deterministic --- End diff -- when i have a record that's 29 days late in feb what should I expect? If we want to just change month to a fix number of days, I'd say just use 30, and then document it clearly in the API (e.g. "if month is specified, 1 month = 30 days"). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16249 **[Test build #70226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70226/consoleFull)** for PR 16249 at commit [`36e35ec`](https://github.com/apache/spark/commit/36e35ece2fe163a6fa361c30a5922d64b910293d). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70226/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16304: [SPARK-18894][SS] Disallow went time watermark de...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92753969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -572,6 +572,10 @@ class Dataset[T] private[sql]( val parsedDelay = Option(CalendarInterval.fromString("interval " + delayThreshold)) .getOrElse(throw new AnalysisException(s"Unable to parse time delay '$delayThreshold'")) +// Threshold specified in months/years is non-deterministic --- End diff -- what does "safe" mean in this context? users must not rely on watermarks for correctness as they can be arbitrarily delayed based on batch boundaries. I think this error actually confuses the point as its is enforcing precision when this API cannot provide that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16304: [SPARK-18894][SS] Disallow went time watermark de...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92753590 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -572,6 +572,10 @@ class Dataset[T] private[sql]( val parsedDelay = Option(CalendarInterval.fromString("interval " + delayThreshold)) .getOrElse(throw new AnalysisException(s"Unable to parse time delay '$delayThreshold'")) +// Threshold specified in months/years is non-deterministic --- End diff -- I think waiting 1 month for late data is a reasonable use case. Based on the definition of the watermark, its actually okay for us to over estimate this delay too. Why not take take the max (31 days, leap year)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16249 **[Test build #70235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70235/testReport)** for PR 16249 at commit [`0a40191`](https://github.com/apache/spark/commit/0a401914ff6240cc1f653eb0d873aefed028bfb2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Disallow went time watermark delay thr...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16304 event time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16301: [SPARK-18849][ML][SPARKR][DOC] vignettes final check reo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16301 **[Test build #70234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70234/testReport)** for PR 16301 at commit [`2f17e1e`](https://github.com/apache/spark/commit/2f17e1e00d19731c241e2e77a522d9dd849073e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16303: [SPARK-17807][core] Demote scalatest to "provided" in sp...
Github user ryan-williams commented on the issue: https://github.com/apache/spark/pull/16303 I appreciate the quick turn-around on this, though it seems like a mis-use of the `provided` scope. FWIW, I am advocating for: - `mv common/tags/src/{main,test}/java/org/apache/spark/tags`; - publish the `test-jar` for the spark-tags module. - I know this is trivial in SBT, and I've seen Maven modules do this - bdgenomics:utils-misc does it with [this config blob](https://github.com/bigdatagenomics/utils/blob/utils-parent_2.11-0.2.10/utils-misc/pom.xml#L46-L56) afaict? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16142 > I just add a new clean-up mode, but not add the cleaner itself. But that's kinda the point. How many different ways of cleaning need to be added? Will this one be enough? Will people ask for archiving next? I'm wary of going down that path. > I think you may not get what I mean. I get what you mean. I just disagree with you that it's an important feature to have. > So, I do not think get the size of each log will hurt NameNode greatly. The current scan code does not make one request to the NameNode per log file in the directory. Your code does. That should be avoided. > Besides, the unit test has proved that the older file will be cleaned first. Your code doesn't do that, so if the unit test shows that it's not by design. Your code is scanning the list of apps in the order they're kept in memory (descending end time). I don't remember whether in progress apps come first or last. But if they come first, an old attempt of an in progress app will have precedence over newer attempts of apps that have already finished. If they come last, then you're first accounting for log sizes of apps that have already finished and might end up trying to delete logs from apps that are still running (!!!). The way the current cleaner code works for time does not work if you're doing the `shouldClean` check solely based on space used. So this feature is not as trivial as your code make it seem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/16281 The improvement is in how row groups are garbage collected. G1GC puts humongous allocations directly into the old generation, so you end up needing a full GC to reclaim the space. That just increases memory pressure, so you run out of memory and run full GCs and/or spill to disk. We don't have data yet because I haven't pushed the feature or metrics collection for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15211: [SPARK-14709][ML] [WIP] spark.ml API for linear SVM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15211 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15211: [SPARK-14709][ML] [WIP] spark.ml API for linear SVM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15211 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70232/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15211: [SPARK-14709][ML] [WIP] spark.ml API for linear SVM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15211 **[Test build #70232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70232/testReport)** for PR 15211 at commit [`c8a7553`](https://github.com/apache/spark/commit/c8a75532f049e16186d63ae8eb22f4ea51eb1cde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16303: [SPARK-17807][core] Demote scalatest to "provided" in sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16303 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70228/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16303: [SPARK-17807][core] Demote scalatest to "provided" in sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16303 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16303: [SPARK-17807][core] Demote scalatest to "provided" in sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16303 **[Test build #70228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70228/testReport)** for PR 16303 at commit [`e5772e2`](https://github.com/apache/spark/commit/e5772e220e5eb3b57421463a1f08085cd9b3f69a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16173 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16173 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16282 **[Test build #70219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70219/testReport)** for PR 16282 at commit [`2143462`](https://github.com/apache/spark/commit/21434625f609b89eff30a2dd88b2886f31fa9521). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class AssignStrategy(partitions: Array[TopicPartition]) extends ConsumerStrategy ` * ` case class SubscribeStrategy(topics: Seq[String]) extends ConsumerStrategy ` * ` case class SubscribePatternStrategy(topicPattern: String)` * `class KafkaConsumerGroupIdGenerator(groupIdPrefix: String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70219/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16173 **[Test build #70227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70227/testReport)** for PR 16173 at commit [`49d799f`](https://github.com/apache/spark/commit/49d799fbbbf49b94af890820a050e395edf24c10). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16232 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70230/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16232 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16232 **[Test build #70230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70230/testReport)** for PR 16232 at commit [`4ef91b2`](https://github.com/apache/spark/commit/4ef91b235758d2e00de988f3f1480d6ebaede527). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/16281 @rdblue Interesting, do you have any estimated or actual data for performance improvement? I am interested in how you can achieve performance improvement over `byte[]`. - Usage of `ByteBuffer` Usage of direct `ByteBuffer` Usage of multiple buffers described at [the PR](https://github.com/apache/parquet-mr/pull/390) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15109: [SPARK-17501][CORE] Record executor heartbeat timestamp ...
Github user cenyuhai commented on the issue: https://github.com/apache/spark/pull/15109 Ok, I will close this PR. This is not a big problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15109: [SPARK-17501][CORE] Record executor heartbeat tim...
Github user cenyuhai closed the pull request at: https://github.com/apache/spark/pull/15109 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16304: [SPARK-18894][SS] Disallow went time watermark delay thr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16304 **[Test build #70233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70233/testReport)** for PR 16304 at commit [`8c603f4`](https://github.com/apache/spark/commit/8c603f46deb7021c9c2b6297be532342d2d8bfab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16304: [SPARK-18894][SS] Disallow went time watermark de...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/16304#discussion_r92746039 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java --- @@ -252,6 +252,9 @@ public static long parseSecondNano(String secondNano) throws IllegalArgumentExce public final int months; public final long microseconds; + /** + * Return the interval in miliseconds, not including the months in interval. + */ public final long milliseconds() { --- End diff -- @rxin Documented as per your requets --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16304: [SPARK-18894][SS] Follow-up bug fix
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/16304 [SPARK-18894][SS] Follow-up bug fix ## What changes were proposed in this pull request? Two changes - Disallow went time watermark delay threshold be specified in months or years - Following up on #16258, not show watermark when there is no watermarking in the query ## How was this patch tested? Updated and new unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-18834-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16304.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16304 commit 7fc341195f6cd348d7745646b06e381dc79fe081 Author: Tathagata DasDate: 2016-12-16T02:45:51Z No event time stats when no watermark is set commit 33a7d1b84266b0d50f7bd1cc4e5bb47226d043cd Author: Tathagata Das Date: 2016-12-16T03:18:18Z Disallow dfelay threshold in months or years --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15211: [SPARK-14709][ML] [WIP] spark.ml API for linear SVM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15211 **[Test build #70232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70232/testReport)** for PR 15211 at commit [`c8a7553`](https://github.com/apache/spark/commit/c8a75532f049e16186d63ae8eb22f4ea51eb1cde). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16300 **[Test build #3502 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3502/testReport)** for PR 16300 at commit [`01dded0`](https://github.com/apache/spark/commit/01dded0dc36a3ea1f178215d31e8659bafdb56ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16285: [SPARK-18867] [SQL] Throw cause if IsolatedClientLoad ca...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16285 Thanks for looking into it. In that case, maybe we should just keep the thing as is and don't change it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16253: [SPARK-18537][Web UI] Add a REST api to serve spark stre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70231/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16253: [SPARK-18537][Web UI] Add a REST api to serve spark stre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16253 **[Test build #70231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70231/testReport)** for PR 16253 at commit [`a6b1bb9`](https://github.com/apache/spark/commit/a6b1bb94fe4e382187a204aedc2d5678e4cea0da). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16253: [SPARK-18537][Web UI] Add a REST api to serve spark stre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16253 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16300 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16299: [MINOR] Only rename SparkR tar.gz if names mismatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16299 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16299: [MINOR] Only rename SparkR tar.gz if names mismatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16299 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70222/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16299: [MINOR] Only rename SparkR tar.gz if names mismatch
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16299 **[Test build #70222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70222/testReport)** for PR 16299 at commit [`95ac227`](https://github.com/apache/spark/commit/95ac2278dbfc358f0073ef10664a709dade87845). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16300 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16300 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70223/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16300: [SPARK-18892][SQL] Alias percentile_approx approx_percen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16300 **[Test build #70223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70223/testReport)** for PR 16300 at commit [`01dded0`](https://github.com/apache/spark/commit/01dded0dc36a3ea1f178215d31e8659bafdb56ce). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16299: [MINOR] Only rename SparkR tar.gz if names mismatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16299 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70220/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16299: [MINOR] Only rename SparkR tar.gz if names mismatch
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16299 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16299: [MINOR] Only rename SparkR tar.gz if names mismatch
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16299 **[Test build #70220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70220/testReport)** for PR 16299 at commit [`fcc8265`](https://github.com/apache/spark/commit/fcc8265f20caa4bfc5b4dca41f430718ef4744cc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16253: [SPARK-18537][Web UI] Add a REST api to serve spark stre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16253 **[Test build #70231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70231/testReport)** for PR 16253 at commit [`a6b1bb9`](https://github.com/apache/spark/commit/a6b1bb94fe4e382187a204aedc2d5678e4cea0da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16272: [SPARK-18850][SS]Make StreamExecution and progress class...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70221/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org