[GitHub] spark pull request: [SPARK-10565] [Core] add missing web UI stats ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9472#issuecomment-154350128 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154350451 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154350428 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11548][DOCS] Replaced example code in m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9519#issuecomment-154350423 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154351785 [Test build #45214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45214/consoleFull) for PR 9503 at commit [`2273f87`](https://github.com/apache/spark/commit/2273f87048854d5ccb8cdc855ed8870c241a20a7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r44117245 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -965,6 +969,80 @@ case class ScalaUDF( } // scalastyle:on + + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter") +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName" + + s".createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)" + + s"expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName" + +s".createToCatalystConverter((($scalaUDFClassName)expressions" + + s"[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +// This must be called before children expressions' codegen +// because ctx.references is used in genCodeForConverter --- End diff -- I've added that. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9406#issuecomment-154352102 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9406#issuecomment-154352052 **[Test build #45207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45207/consoleFull)** for PR 9406 at commit [`9be5b9d`](https://github.com/apache/spark/commit/9be5b9d9e3c9de81473ac93750b687e2de824bb2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat`\n * `final class UnsafeSortDataFormat extends SortDataFormat `\n * `case class Expand(`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-154352789 **[Test build #45208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45208/consoleFull)** for PR 9143 at commit [`f3f79dd`](https://github.com/apache/spark/commit/f3f79dda27cb35b22d00ff5709571ec26f1cab6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` case class Record(data: ByteBuffer, time: Long, promise: Promise[WriteAheadLogRecordHandle])`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6541 - Sort executors by ID (numeric)
Github user jbonofre commented on the pull request: https://github.com/apache/spark/pull/9165#issuecomment-154353494 Let me prepare couple of screenshots to illustrate the sort order. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-154353164 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45208/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154353831 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45214/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154353812 [Test build #45214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45214/console) for PR 9503 at commit [`2273f87`](https://github.com/apache/spark/commit/2273f87048854d5ccb8cdc855ed8870c241a20a7). * This patch **fails to build**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `abstract class DatabaseOnDocker ` * `class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite ` * `assert(types(0).equals("class java.lang.Integer"))` * `assert(types(1).equals("class java.lang.String"))` * `assert(types(0).equals("class java.lang.Boolean"))` * `assert(types(1).equals("class java.lang.Long"))` * `assert(types(2).equals("class java.lang.Integer"))` * `assert(types(3).equals("class java.lang.Integer"))` * `assert(types(4).equals("class java.lang.Integer"))` * `assert(types(5).equals("class java.lang.Long"))` * `assert(types(6).equals("class java.math.BigDecimal"))` * `assert(types(7).equals("class java.lang.Double"))` * `assert(types(8).equals("class java.lang.Double"))` * `assert(types(0).equals("class java.sql.Date"))` * `assert(types(1).equals("class java.sql.Timestamp"))` * `assert(types(2).equals("class java.sql.Timestamp"))` * `assert(types(3).equals("class java.sql.Timestamp"))` * `assert(types(4).equals("class java.sql.Date"))` * `assert(types(0).equals("class java.lang.String"))` * `assert(types(1).equals("class java.lang.String"))` * `assert(types(2).equals("class java.lang.String"))` * `assert(types(3).equals("class java.lang.String"))` * `assert(types(4).equals("class java.lang.String"))` * `assert(types(5).equals("class java.lang.String"))` * `assert(types(6).equals("class [B"))` * `assert(types(7).equals("class [B"))` * `assert(types(8).equals("class [B"))` * `class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite ` * `assert(types(0).equals("class java.lang.String"))` * `assert(types(1).equals("class java.lang.Integer"))` * `assert(types(2).equals("class java.lang.Double"))` * `assert(types(3).equals("class java.lang.Long"))` * `assert(types(4).equals("class java.lang.Boolean"))` * `assert(types(5).equals("class [B"))` * `assert(types(6).equals("class [B"))` * `assert(types(7).equals("class java.lang.Boolean"))` * `assert(types(8).equals("class java.lang.String"))` * `assert(types(9).equals("class java.lang.String"))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-154353160 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154356583 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154356522 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9065][Streaming][PySpark] Add MessageHa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7410#issuecomment-154359198 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154360557 **[Test build #45215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45215/consoleFull)** for PR 9503 at commit [`65315c4`](https://github.com/apache/spark/commit/65315c4f07057d02b872ed7a9728f4292f73d466). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `abstract class DatabaseOnDocker `\n * `class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite `\n * `assert(types(0).equals(\"class java.lang.Integer\"))`\n * `assert(types(1).equals(\"class java.lang.String\"))`\n * `assert(types(0).equals(\"class java.lang.Boolean\"))`\n * `assert(types(1).equals(\"class java.lang.Long\"))`\n * `assert(types(2).equals(\"class java.lang.Integer\"))`\n * `assert(types(3).equals(\"class java.lang.Integer\"))`\n * `assert(types(4).equals(\"class java.lang.Integer\"))`\n * `assert(types(5).equals(\"class java.lang.Long\"))`\n * `assert(types(6).equals(\"class java.math.BigDecimal\"))`\n * `assert(types(7).equals(\"class java.lang.Double\"))`\n * `assert(types(8).equals(\"class java.lang.Double\"))`\n * `assert(types(0).equals(\"class java.sql.Date\"))`\n * `assert(types(1).equals(\"class java.sql.Timestamp \"))`\n * `assert(types(2).equals(\"class java.sql.Timestamp\"))`\n * ` assert(types(3).equals(\"class java.sql.Timestamp\"))`\n * ` assert(types(4).equals(\"class java.sql.Date\"))`\n * ` assert(types(0).equals(\"class java.lang.String\"))`\n * ` assert(types(1).equals(\"class java.lang.String\"))`\n * ` assert(types(2).equals(\"class java.lang.String\"))`\n * ` assert(types(3).equals(\"class java.lang.String\"))`\n * ` assert(types(4).equals(\"class java.lang.String\"))`\n * ` assert(types(5).equals(\"class java.lang.String\"))`\n * ` assert(types(6).equals(\"class [B\"))`\n * `assert(types(7).equals(\"class [B\"))`\n * `assert(types(8).equals(\"class [B\"))`\n * `class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite `\n * ` assert(types(0).equals(\"class java.lang.String\"))`\n * ` assert(types(1).equals(\"class java.lang.Integer\"))`\n * ` assert(types(2).equals(\"class java.lang.Double\"))`\n * `as sert(types(3).equals(\"class java.lang.Long\"))`\n * ` assert(types(4).equals(\"class java.lang.Boolean\"))`\n * ` assert(types(5).equals(\"class [B\"))`\n * `assert(types(6).equals(\"class [B\"))`\n * `assert(types(7).equals(\"class java.lang.Boolean\"))`\n * ` assert(types(8).equals(\"class java.lang.String\"))`\n * ` assert(types(9).equals(\"class java.lang.String\"))`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154360561 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45215/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9065][Streaming][PySpark] Add MessageHa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7410#issuecomment-154360597 **[Test build #45216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45216/consoleFull)** for PR 7410 at commit [`ed4d179`](https://github.com/apache/spark/commit/ed4d1791c8ca7b7260db17a198e990033fc901f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-154360559 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-154361228 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-154361230 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45201/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-154364662 **[Test build #45217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45217/consoleFull)** for PR 9287 at commit [`b83ab2c`](https://github.com/apache/spark/commit/b83ab2ca6c2d6d05feae1cd3e06ff4f0bcabe832). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11507] [MLlib] add compact in Matrices ...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/9520 [SPARK-11507] [MLlib] add compact in Matrices fromBreeze jira: https://issues.apache.org/jira/browse/SPARK-11507 "In certain situations when adding two block matrices, I get an error regarding colPtr and the operation fails. External issue URL includes full error and code for reproducing the problem." root cause: colPtr.last does NOT always equal to values.length in breeze SCSMatrix, which fails the require in SparseMatrix. easy step to repro: ``` val m1: BM[Double] = new CSCMatrix[Double] (Array (1.0, 1, 1), 3, 3, Array (0, 1, 2, 3), Array (0, 1, 2) ) val m2: BM[Double] = new CSCMatrix[Double] (Array (1.0, 2, 2, 4), 3, 3, Array (0, 0, 2, 4), Array (1, 2, 1, 2) ) val sum = m1 + m2 Matrices.fromBreeze(sum) ``` Solution: By checking the code in [CSCMatrix](https://github.com/scalanlp/breeze/blob/28000a7b901bc3cfbbbf5c0bce1d0a5dda8281b0/math/src/main/scala/breeze/linalg/CSCMatrix.scala), CSCMatrix in breeze can have extra zeros in the end of data array. Invoking compact will make sure it aligns with the require of SparseMatrix. This should add limited overhead as the actual compact operation is only performed when necessary. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark matricesFromBreeze Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9520 commit 50eee83d39d146df83d4aa9d76a1cad49669f9b1 Author: Yuhao YangDate: 2015-11-06T09:32:37Z add compact in Matrices fromBreeze --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11507] [MLlib] add compact in Matrices ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9520#issuecomment-154366500 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11389][CORE] Add support for off-heap m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9344#issuecomment-154377070 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11389][CORE] Add support for off-heap m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9344#issuecomment-154377089 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11507] [MLlib] add compact in Matrices ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9520#issuecomment-154377374 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45218/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11507] [MLlib] add compact in Matrices ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9520#issuecomment-154377270 **[Test build #45218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45218/consoleFull)** for PR 9520 at commit [`50eee83`](https://github.com/apache/spark/commit/50eee83d39d146df83d4aa9d76a1cad49669f9b1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11507] [MLlib] add compact in Matrices ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9520#issuecomment-154377373 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10978] [SQL] [FOLLOW-UP] More comprehen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9468#issuecomment-154378200 **[Test build #45212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45212/consoleFull)** for PR 9468 at commit [`433632a`](https://github.com/apache/spark/commit/433632ad21b42cd9a5a636febbcabe6a9d9eb56a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class FilteredScanSuite extends DataSourceTest with SharedSQLContext with PredicateHelper `\n * `class SimpleTextHadoopFsRelationSuite extends HadoopFsRelationTest with PredicateHelper `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10978] [SQL] [FOLLOW-UP] More comprehen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9468#issuecomment-154378538 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45212/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10978] [SQL] [FOLLOW-UP] More comprehen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9468#issuecomment-154378537 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11389][CORE] Add support for off-heap m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9344#issuecomment-154380899 **[Test build #45221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45221/consoleFull)** for PR 9344 at commit [`c761736`](https://github.com/apache/spark/commit/c761736643c1053ae04aed4e1ab227709a0ad418). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11450][SQL] Add Unsafe Row processing t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9414#issuecomment-154381079 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11450][SQL] Add Unsafe Row processing t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9414#issuecomment-154380928 **[Test build #45213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45213/consoleFull)** for PR 9414 at commit [`26f6f7e`](https://github.com/apache/spark/commit/26f6f7ed1e43399ac2558e741881ec3af8ee68dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat`\n * `final class UnsafeSortDataFormat extends SortDataFormat `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11450][SQL] Add Unsafe Row processing t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9414#issuecomment-154381081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45213/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9478] [ml] Add class weights to Random ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9008#issuecomment-154381876 [Test build #45206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45206/console) for PR 9008 at commit [`c1785a8`](https://github.com/apache/spark/commit/c1785a8f3055bc48ce480b827befcb27812f0449). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9478] [ml] Add class weights to Random ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9008#issuecomment-154381940 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9478] [ml] Add class weights to Random ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9008#issuecomment-154381942 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45206/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...
GitHub user pravingadakh opened a pull request: https://github.com/apache/spark/pull/9522 [SPARK-11535][ML] handling empty string in StringIndexer Replacing "" (not null) with string "EMPTY_STRING" in StringIndexer. Another approach is to use "0" (or next available integer), but it may have performance issues when input column has integer values say (0 to 10). We can use another string to replace "" values if "EMPTY_STRING" is commonly used. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pravingadakh/spark SPARK-11535 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9522 commit 66eba8e322e170e6570900cbf0b2802947d95781 Author: Pravin GadakhDate: 2015-11-06T11:35:48Z handling empty string in StringIndexer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11295 Add packages to JUnit output for P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9263#issuecomment-154479089 **[Test build #45227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45227/consoleFull)** for PR 9263 at commit [`8dc37f8`](https://github.com/apache/spark/commit/8dc37f819592f69b2c51c6c8c62864cc4f7e5311). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-5565] [ML] LDA wrapper for Pipeli...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/9513#issuecomment-154480232 @jkbradley Thanks for looping me in and welcome back. I'll try to send some feedback this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Typo fixes + code readability improvements
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9501#issuecomment-154383561 **[Test build #1996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1996/consoleFull)** for PR 9501 at commit [`aa911d5`](https://github.com/apache/spark/commit/aa911d580421b0025ca32ecce814117287853799). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `final class DecisionTreeRegressor @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)`\n * `final class GBTRegressor @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)`\n * `class IsotonicRegression @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)`\n * `class LinearRegression @Since(\"1.3.0\") (@Since(\"1.3.0\") override val uid: String)`\n * `final class RandomForestRegressor @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11554][SQL] add map/flatMap to GroupedD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9521#issuecomment-154388797 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11554][SQL] add map/flatMap to GroupedD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9521#issuecomment-154388703 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9406#issuecomment-154398778 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45219/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11479][SQL] add kmeans example for Data...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9436#issuecomment-154398789 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9406#issuecomment-154398663 **[Test build #45219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45219/consoleFull)** for PR 9406 at commit [`d3bdb2b`](https://github.com/apache/spark/commit/d3bdb2bb0096663af7d9faf4c0963a3df00065aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `case class Expand(`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8992] [SQL] Add pivot to dataframe api
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/7841#issuecomment-154477941 @aray sorry was away for spark summit - back now and will get to this today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10565] [Core] add missing web UI stats ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9472#issuecomment-154479994 Build triggered. sha1 is original commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10565] [Core] add missing web UI stats ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9472#issuecomment-154480025 Build started sha1 is original commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9089#issuecomment-154498875 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154498818 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44172629 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- you could return this in Java : List
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9089#issuecomment-154498839 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154499244 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/9270#issuecomment-154499721 LGTM, merging this into master and 1.6 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44173319 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -488,7 +491,12 @@ class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false registerReceiver(streamId, typ, host, executorId, receiverEndpoint, context.senderAddress) context.reply(successful) case AddBlock(receivedBlockInfo) => -context.reply(addBlock(receivedBlockInfo)) +if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) { + val f = Future(addBlock(receivedBlockInfo))(walBatchingThreadPool) + f.onComplete(result => context.reply(result.get))(walBatchingThreadPool) --- End diff -- Just realized there is a race condition here need to think about. What will happen if the codes in Future run after `ReceiverTracker.stop`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44173447 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -488,7 +491,12 @@ class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false registerReceiver(streamId, typ, host, executorId, receiverEndpoint, context.senderAddress) context.reply(successful) case AddBlock(receivedBlockInfo) => -context.reply(addBlock(receivedBlockInfo)) +if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) { + val f = Future(addBlock(receivedBlockInfo))(walBatchingThreadPool) + f.onComplete(result => context.reply(result.get))(walBatchingThreadPool) --- End diff -- Is it safe to just catch NonFatal exception and log it in Future body and `onComplete`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9270 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154501691 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-154501697 @nilmeier thanks. This document would help you with contributing to Spark. https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154502425 **[Test build #45235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45235/consoleFull)** for PR 9517 at commit [`bcf72d3`](https://github.com/apache/spark/commit/bcf72d3ca308f9a69993803d9c8939696c915b07). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154503830 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45234/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10978] [SQL] [FOLLOW-UP] More comprehen...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9468#issuecomment-154503953 LGTM. Merging to master and branch 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154503826 Build finished. No test results found. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-154503795 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9858] [SQL] Add an ExchangeCoordinator ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9453 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154505090 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10387][ML][WIP] Add code gen for gbt
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9524#issuecomment-154505864 Please don't use macros and quasiquotes. They are slow and not IDE friendly. We replaced them with Janino in SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-154508219 I think it would be natural to fix the Scala error message and fix the condition in Python. The algorithm checks not the indexes have duplicated values, but the indexes are sorted. @srowen What do you think? ## Python Replace `>=` with `>`. ``` if self.indices[i] > self.indices[i + 1]: raise TypeError("indices array must be sorted") ``` ## Scala Change the message. ``` require(prev < i, s"indices array must be sorted: $i.") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44176718 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.util + +import java.nio.ByteBuffer +import java.util.concurrent.LinkedBlockingQueue +import java.util.{Iterator => JIterator} + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.{Await, Promise} +import scala.concurrent.duration._ +import scala.util.control.NonFatal + +import org.apache.spark.{SparkException, Logging} +import org.apache.spark.util.Utils + +/** + * A wrapper for a WriteAheadLog that batches records before writing data. Handles aggregation + * during writes, and de-aggregation in the `readAll` method. The end consumer has to handle + * de-aggregation after the `read` method. In addition, the `WriteAheadLogRecordHandle` returned + * after the write will contain the batch of records rather than individual records. + * + * When writing a batch of records, the `time` passed to the `wrappedLog` will be the timestamp + * of the latest record in the batch. This is very important in achieving correctness. Consider the + * following example: + * We receive records with timestamps 1, 3, 5, 7. We use "log-1" as the filename. Once we receive + * a clean up request for timestamp 3, we would clean up the file "log-1", and lose data regarding + * 5 and 7. + * + * All other methods of the WriteAheadLog interface will be passed on to the wrapped WriteAheadLog. + */ +private[util] class BatchedWriteAheadLog(val wrappedLog: WriteAheadLog) + extends WriteAheadLog with Logging { + + import BatchedWriteAheadLog._ + + // exposed for tests + private val walWriteQueue = new LinkedBlockingQueue[Record]() + + // Whether the writer thread is active + @volatile private var active: Boolean = true + private val buffer = new ArrayBuffer[Record]() + + private val batchedWriterThread = startBatchedWriterThread() + + /** + * Write a byte buffer to the log file. This method adds the byteBuffer to a queue and blocks + * until the record is properly written by the parent. + */ + override def write(byteBuffer: ByteBuffer, time: Long): WriteAheadLogRecordHandle = { +val promise = Promise[WriteAheadLogRecordHandle]() +walWriteQueue.offer(Record(byteBuffer, time, promise)) +Await.result(promise.future, WAL_WRITE_STATUS_TIMEOUT.milliseconds) + } + + /** + * Read a segment from an existing Write Ahead Log. The data may be aggregated, and the user + * should de-aggregate using [[BatchedWriteAheadLog.deaggregate]] + * + * This method is handled by the parent WriteAheadLog. + */ + override def read(segment: WriteAheadLogRecordHandle): ByteBuffer = { +wrappedLog.read(segment) + } + + /** + * Read all the existing logs from the log directory. The output of the wrapped WriteAheadLog + * will be de-aggregated. + */ + override def readAll(): JIterator[ByteBuffer] = { +wrappedLog.readAll().asScala.flatMap(deaggregate).asJava + } + + /** + * Delete the log files that are older than the threshold time. + * + * This method is handled by the parent WriteAheadLog. + */ + override def clean(threshTime: Long, waitForCompletion: Boolean): Unit = { +wrappedLog.clean(threshTime, waitForCompletion) + } + + + /** + * Stop the batched writer thread, fulfill promises with failures and close the wrapped WAL. + */ + override def close(): Unit = { +logInfo("BatchedWriteAheadLog shutting down.") +active = false +batchedWriterThread.interrupt() +batchedWriterThread.join() +
[GitHub] spark pull request: [SPARK-10565] [Core] add missing web UI stats ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9472#issuecomment-154509201 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10565] [Core] add missing web UI stats ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9472#issuecomment-154509236 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10565] [Core] add missing web UI stats ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9472#issuecomment-154509884 **[Test build #45238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45238/consoleFull)** for PR 9472 at commit [`c91e928`](https://github.com/apache/spark/commit/c91e92859f8bbd922005d88d2e57bb12dd1f9f76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10371] [SQL] Implement subexpr eliminat...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9480#discussion_r44177613 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import scala.collection.mutable + +/** + * This class is used to compute equality of (sub)expression trees. Expressions can be added + * to this class and they subsequently query for expression equality. Expression trees are + * considered equal if for the same input(s), the same result is produced. + */ +class EquivalentExpressions { + /** + * Wrapper around an Expression that provides semantic equality. + */ + case class Expr(e: Expression) { +val hash = e.semanticHash() +override def equals(o: Any): Boolean = o match { + case other: Expr => e.semanticEquals(other.e) + case _ => false +} +override def hashCode: Int = hash + } + + // For each expression, the set of equivalent expressions. + private val equivalenceMap: mutable.HashMap[Expr, mutable.MutableList[Expression]] = + new mutable.HashMap[Expr, mutable.MutableList[Expression]] + + /** + * Adds each expression to this data structure, grouping them with existing equivalent + * expressions. Non-recursive. + * Returns if there was already a matching expression. + */ + def addExpr(expr: Expression): Boolean = { +if (expr.deterministic) { + val e: Expr = Expr(expr) + val f = equivalenceMap.get(e) + if (f.isDefined) { +f.get.+= (expr) +true + } else { +equivalenceMap.put(e, mutable.MutableList(expr)) +false + } +} else { + false +} + } + + /** + * Adds the expression to this datastructure recursively. Stops if a matching expression + * is found. That is, if `expr` has already been added, its children are not added. + * If ignoreLeaf is true, leaf nodes are ignored. + */ + def addExprTree(root: Expression, ignoreLeaf: Boolean): Unit = { --- End diff -- Since ignoreLeaf is only useful for testing, we can have a default `true` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10371] [SQL] Implement subexpr eliminat...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9480#discussion_r44178151 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -341,7 +440,18 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin protected def declareMutableStates(ctx: CodeGenContext): String = { ctx.mutableStates.map { case (javaType, variableName, _) => s"private $javaType $variableName;" -}.mkString("\n") +}.mkString("\n") + "\n" + +// Maintain the loaded value and isNull as member variables. This is necessary if the codegen +// function is split across multiple functions. +// TODO: maintaining this as a local variable probably allows the compiler to do better +// optimizations. +ctx.subExprEliminationStates.map { s => { --- End diff -- Should we use a separate method for this part? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10387][ML][WIP] Add code gen for gbt
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/9524#issuecomment-154510939 @rxin Ok I'll update this to use Janino although I'll miss my compile time type checking I figured I'd probably have to do that. In the meantime does the API look like an OK place to start. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11410] [PYSPARK] Add python bindings fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9504#issuecomment-154511541 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11410] [PYSPARK] Add python bindings fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9504#issuecomment-154511574 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user nilmeier commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-154499019 Thank you for the comments Yu. I'll update these in the next few days. Best, Jerome --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11077] [SQL] Join elimination in Cataly...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9089#issuecomment-154499514 **[Test build #45232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45232/consoleFull)** for PR 9089 at commit [`5abceae`](https://github.com/apache/spark/commit/5abceaebadedc130feeab7aec8b97f4fac3bdfda). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154499913 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154499940 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154499941 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154499906 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44174052 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.util + +import java.nio.ByteBuffer +import java.util.concurrent.LinkedBlockingQueue +import java.util.{Iterator => JIterator} + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.{Await, Promise} +import scala.concurrent.duration._ +import scala.util.control.NonFatal + +import org.apache.spark.{SparkException, Logging} +import org.apache.spark.util.Utils + +/** + * A wrapper for a WriteAheadLog that batches records before writing data. Handles aggregation + * during writes, and de-aggregation in the `readAll` method. The end consumer has to handle + * de-aggregation after the `read` method. In addition, the `WriteAheadLogRecordHandle` returned + * after the write will contain the batch of records rather than individual records. + * + * When writing a batch of records, the `time` passed to the `wrappedLog` will be the timestamp + * of the latest record in the batch. This is very important in achieving correctness. Consider the + * following example: + * We receive records with timestamps 1, 3, 5, 7. We use "log-1" as the filename. Once we receive + * a clean up request for timestamp 3, we would clean up the file "log-1", and lose data regarding + * 5 and 7. + * + * All other methods of the WriteAheadLog interface will be passed on to the wrapped WriteAheadLog. + */ +private[util] class BatchedWriteAheadLog(val wrappedLog: WriteAheadLog) + extends WriteAheadLog with Logging { + + import BatchedWriteAheadLog._ + + // exposed for tests --- End diff -- nit: not necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10666][SPARK-6880][CORE] Use properties...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/6291#issuecomment-154502060 Nothing actually new -- I still think this one is ready to go after a little review. I am noting, however, that the prior discussion misses one element that raises the significance of retaining the correct job properties a little more: the executionId in SQLExecution is also a local job property, so we really don't want to be losing that unnecessarily. @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44174435 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.util + +import java.nio.ByteBuffer +import java.util.concurrent.LinkedBlockingQueue +import java.util.{Iterator => JIterator} + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.{Await, Promise} +import scala.concurrent.duration._ +import scala.util.control.NonFatal + +import org.apache.spark.{SparkException, Logging} +import org.apache.spark.util.Utils + +/** + * A wrapper for a WriteAheadLog that batches records before writing data. Handles aggregation + * during writes, and de-aggregation in the `readAll` method. The end consumer has to handle + * de-aggregation after the `read` method. In addition, the `WriteAheadLogRecordHandle` returned + * after the write will contain the batch of records rather than individual records. + * + * When writing a batch of records, the `time` passed to the `wrappedLog` will be the timestamp + * of the latest record in the batch. This is very important in achieving correctness. Consider the + * following example: + * We receive records with timestamps 1, 3, 5, 7. We use "log-1" as the filename. Once we receive + * a clean up request for timestamp 3, we would clean up the file "log-1", and lose data regarding + * 5 and 7. + * + * All other methods of the WriteAheadLog interface will be passed on to the wrapped WriteAheadLog. + */ +private[util] class BatchedWriteAheadLog(val wrappedLog: WriteAheadLog) + extends WriteAheadLog with Logging { + + import BatchedWriteAheadLog._ + + // exposed for tests + private val walWriteQueue = new LinkedBlockingQueue[Record]() + + // Whether the writer thread is active + @volatile private var active: Boolean = true + private val buffer = new ArrayBuffer[Record]() + + private val batchedWriterThread = startBatchedWriterThread() + + /** + * Write a byte buffer to the log file. This method adds the byteBuffer to a queue and blocks + * until the record is properly written by the parent. + */ + override def write(byteBuffer: ByteBuffer, time: Long): WriteAheadLogRecordHandle = { +val promise = Promise[WriteAheadLogRecordHandle]() +walWriteQueue.offer(Record(byteBuffer, time, promise)) +Await.result(promise.future, WAL_WRITE_STATUS_TIMEOUT.milliseconds) --- End diff -- I think it's better to add a configuration for `WAL_WRITE_STATUS_TIMEOUT`. The default value is 5 seconds. It may be too short considering now we write a batch of records. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154502199 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154502163 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
GitHub user rekhajoshm opened a pull request: https://github.com/apache/spark/pull/9525 [SPARK-11531] [ML] : SparseVector error Msg PySpark SparseVector should have "Found duplicate indices" error message You can merge this pull request into a Git repository by running: $ git pull https://github.com/rekhajoshm/spark SPARK-11531 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9525.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9525 commit e3677c9fa9697e0d34f9df52442085a6a481c9e9 Author: Rekha JoshiDate: 2015-05-05T23:10:08Z Merge pull request #1 from apache/master Pulling functionality from apache spark commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75 Author: Rekha Joshi Date: 2015-05-08T21:49:09Z Merge pull request #2 from apache/master pull latest from apache spark commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c Author: Rekha Joshi Date: 2015-06-22T00:08:08Z Merge pull request #3 from apache/master Pulling functionality from apache spark commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3 Author: Rekha Joshi Date: 2015-09-17T01:03:09Z Merge pull request #4 from apache/master Pulling functionality from apache spark commit d74e0848e2ca9333a2e51c16ed9758bb50ae333c Author: Joshi Date: 2015-11-06T19:07:56Z Fix for sparsevector error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-154503752 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154503748 Build finished. No test results found. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11500][SQL] Not deterministic order of ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9517#issuecomment-154503753 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45233/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11531] [ML] : SparseVector error Msg
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9525#issuecomment-154504166 **[Test build #45236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45236/consoleFull)** for PR 9525 at commit [`0993a5b`](https://github.com/apache/spark/commit/0993a5b42ed72a408c6de3b39f7193f79b715ccb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10978] [SQL] [FOLLOW-UP] More comprehen...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9468 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9858] [SQL] Add an ExchangeCoordinator ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9453#issuecomment-154504554 Thanks for reviewing! I am merging it to master and branch 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org