[GitHub] spark issue #14258: [Spark-16579][SparkR] add install_spark function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14258 **[Test build #62912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62912/consoleFull)** for PR 14258 at commit [`22f2f78`](https://github.com/apache/spark/commit/22f2f786bceeb599645c12210e3f49e66378ba6c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14258: [Spark-16579][SparkR] add install_spark function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14258 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62911/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14258: [Spark-16579][SparkR] add install_spark function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14258 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14258: [Spark-16579][SparkR] add install_spark function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14258 **[Test build #62911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62911/consoleFull)** for PR 14258 at commit [`fa94e3c`](https://github.com/apache/spark/commit/fa94e3cc99e93aea708a609733bbe9364b904efe). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14258: [Spark-16579][SparkR] add install_spark function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14258 **[Test build #62911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62911/consoleFull)** for PR 14258 at commit [`fa94e3c`](https://github.com/apache/spark/commit/fa94e3cc99e93aea708a609733bbe9364b904efe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 BTW, actually, this is not only about user-given schema. Currently, it always reads data into dataframe by datasources based on `FileFormat` ignoring nullability in schema (for both user-given schema and inferred/read schema). However, this does not happen when reading for streaming by the datasources (and another JSON api). So, this PR tries to make them consistent to ignore the nullability in schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Thanks for feedback @cloud-fan ! If the user-given schema is wrong, it is handled differently for each datasource specific. - For JSON and CSV it is kind of permissive generally (for example, compatibility among numeric types). - For ORC and Parquet Generally it is strict to types. So they don't allow the compatibility (except for very few cases, e.g. for parquet, https://github.com/apache/spark/pull/14272 and https://github.com/apache/spark/pull/14278) I think so. Should we disallow specifying schemas for these? - For JDBC it does not take user-given schema since it does not implement `SchemaRelationProvider`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBui...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14257 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14257 LGTM, merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14358 **[Test build #62910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62910/consoleFull)** for PR 14358 at commit [`0161896`](https://github.com/apache/spark/commit/016189620d711f8e8abb0b2886b9b35ac1321911). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14358 LGTM, pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14124 What will happen if the given schema is wrong? It seems weird that we allow users to provide schema while reading the data, but without validating it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14358: [SPARK-16729][SQL] Throw analysis exception for i...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14358#discussion_r72380761 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -54,7 +54,9 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { // follow [[org.apache.spark.sql.catalyst.expressions.Cast.canCast]] logic // to ensure we test every possible cast situation here atomicTypes.zip(atomicTypes).foreach { case (from, to) => - checkNullCast(from, to) + if (Cast.canCast(from, to)) { --- End diff -- removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14358: [SPARK-16729][SQL] Throw analysis exception for i...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14358#discussion_r72380667 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -54,7 +54,9 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { // follow [[org.apache.spark.sql.catalyst.expressions.Cast.canCast]] logic // to ensure we test every possible cast situation here atomicTypes.zip(atomicTypes).foreach { case (from, to) => - checkNullCast(from, to) + if (Cast.canCast(from, to)) { --- End diff -- ah this is doing self casting - i read it wrong. let me remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14296#discussion_r72380502 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1207,6 +1207,17 @@ class Analyzer( val alias = Alias(ae, ae.toString)() aggregateExpressions += alias alias.toAttribute + // Replacing [[NamedExpression]] causes the error on [[Grouping]] because the + // grouping column will be new attribute created by adding additional [[Alias]]. + // So we can't find the grouping column and replace it in the rule + // [[ResolveGroupingAnalytics]]. --- End diff -- I don't quite understand this comment, can you give a concrete example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14358: [SPARK-16729][SQL] Throw analysis exception for i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14358#discussion_r72380321 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -54,7 +54,9 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { // follow [[org.apache.spark.sql.catalyst.expressions.Cast.canCast]] logic // to ensure we test every possible cast situation here atomicTypes.zip(atomicTypes).foreach { case (from, to) => - checkNullCast(from, to) + if (Cast.canCast(from, to)) { --- End diff -- ``` def canCast(from: DataType, to: DataType): Boolean = (from, to) match { case (fromType, toType) if fromType == toType => true .. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14358: [SPARK-16729][SQL] Throw analysis exception for i...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14358#discussion_r72380199 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -54,7 +54,9 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { // follow [[org.apache.spark.sql.catalyst.expressions.Cast.canCast]] logic // to ensure we test every possible cast situation here atomicTypes.zip(atomicTypes).foreach { case (from, to) => - checkNullCast(from, to) + if (Cast.canCast(from, to)) { --- End diff -- Not all atomicTypes can cast from each other? E.g. date. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72380101 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLCompatibilityFunctionSuite.scala --- @@ -69,4 +73,25 @@ class SQLCompatibilityFunctionSuite extends QueryTest with SharedSQLContext { sql("SELECT nvl2(null, 1, 2.1d), nvl2('n', 1, 2.1d)"), Row(2.1, 1.0)) } + + test("SPARK-16730 cast alias functions for Hive compatibility") { +checkAnswer( + sql("SELECT boolean(1), tinyint(1), smallint(1), int(1), bigint(1)"), + Row(true, 1.toByte, 1.toShort, 1, 1L)) + +checkAnswer( + sql("SELECT float(1), double(1), decimal(1)"), + Row(1.toFloat, 1.0, new BigDecimal(1))) + +checkAnswer( + sql("SELECT date(\"2014-04-04\"), timestamp(date(\"2014-04-04\"))"), + Row(new java.util.Date(114, 3, 4), new Timestamp(114, 3, 4, 0, 0, 0, 0))) + +checkAnswer( + sql("SELECT string(1)"), + Row("1")) + +// Error handling: only one argument +assert(intercept[AnalysisException](sql("SELECT string(1, 2)")).getMessage.contains("one arg")) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14364: [SPARK-16730][SQL] Implement function aliases for type c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14364 **[Test build #62909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62909/consoleFull)** for PR 14364 at commit [`3b78da3`](https://github.com/apache/spark/commit/3b78da343c06b7f1df2a67136cda99b4b74bc0f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14358: [SPARK-16729][SQL] Throw analysis exception for i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14358#discussion_r72379875 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -54,7 +54,9 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { // follow [[org.apache.spark.sql.catalyst.expressions.Cast.canCast]] logic // to ensure we test every possible cast situation here atomicTypes.zip(atomicTypes).foreach { case (from, to) => - checkNullCast(from, to) + if (Cast.canCast(from, to)) { --- End diff -- why this check? doesn;t `from` always equal to `to` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72379526 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -408,8 +409,21 @@ object FunctionRegistry { expression[BitwiseAnd]("&"), expression[BitwiseNot]("~"), expression[BitwiseOr]("|"), -expression[BitwiseXor]("^") - +expression[BitwiseXor]("^"), + +// Cast aliases (SPARK-16730) +castAlias("boolean", BooleanType), +castAlias("tinyint", ByteType), +castAlias("smallint", ShortType), +castAlias("int", IntegerType), +castAlias("bigint", LongType), --- End diff -- ok agree --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14364: [SPARK-16730][SQL] Implement function aliases for type c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14364 mostly LGTM, thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14358 Is this good to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72379517 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -408,8 +409,21 @@ object FunctionRegistry { expression[BitwiseAnd]("&"), expression[BitwiseNot]("~"), expression[BitwiseOr]("|"), -expression[BitwiseXor]("^") - +expression[BitwiseXor]("^"), + +// Cast aliases (SPARK-16730) +castAlias("boolean", BooleanType), +castAlias("tinyint", ByteType), +castAlias("smallint", ShortType), +castAlias("int", IntegerType), +castAlias("bigint", LongType), +castAlias("float", FloatType), +castAlias("double", DoubleType), +castAlias("decimal", DecimalType.USER_DEFAULT), --- End diff -- This is not what Hive's default does, but what Spark SQL's cast default. I think it is a bug, but I'm not sure if it is intentional. I suggest we change this in a separate pull request, since there is more than one place to check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72379516 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLCompatibilityFunctionSuite.scala --- @@ -69,4 +73,25 @@ class SQLCompatibilityFunctionSuite extends QueryTest with SharedSQLContext { sql("SELECT nvl2(null, 1, 2.1d), nvl2('n', 1, 2.1d)"), Row(2.1, 1.0)) } + + test("SPARK-16730 cast alias functions for Hive compatibility") { +checkAnswer( + sql("SELECT boolean(1), tinyint(1), smallint(1), int(1), bigint(1)"), + Row(true, 1.toByte, 1.toShort, 1, 1L)) + +checkAnswer( + sql("SELECT float(1), double(1), decimal(1)"), + Row(1.toFloat, 1.0, new BigDecimal(1))) + +checkAnswer( + sql("SELECT date(\"2014-04-04\"), timestamp(date(\"2014-04-04\"))"), + Row(new java.util.Date(114, 3, 4), new Timestamp(114, 3, 4, 0, 0, 0, 0))) + +checkAnswer( + sql("SELECT string(1)"), + Row("1")) + +// Error handling: only one argument +assert(intercept[AnalysisException](sql("SELECT string(1, 2)")).getMessage.contains("one arg")) --- End diff -- how about we use the full error message here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72379420 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -408,8 +409,21 @@ object FunctionRegistry { expression[BitwiseAnd]("&"), expression[BitwiseNot]("~"), expression[BitwiseOr]("|"), -expression[BitwiseXor]("^") - +expression[BitwiseXor]("^"), + +// Cast aliases (SPARK-16730) +castAlias("boolean", BooleanType), +castAlias("tinyint", ByteType), +castAlias("smallint", ShortType), +castAlias("int", IntegerType), +castAlias("bigint", LongType), +castAlias("float", FloatType), +castAlias("double", DoubleType), +castAlias("decimal", DecimalType.USER_DEFAULT), --- End diff -- can you double check it with hive? what's the default decimal type in hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72379213 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -408,8 +409,21 @@ object FunctionRegistry { expression[BitwiseAnd]("&"), expression[BitwiseNot]("~"), expression[BitwiseOr]("|"), -expression[BitwiseXor]("^") - +expression[BitwiseXor]("^"), + +// Cast aliases (SPARK-16730) +castAlias("boolean", BooleanType), +castAlias("tinyint", ByteType), +castAlias("smallint", ShortType), +castAlias("int", IntegerType), +castAlias("bigint", LongType), --- End diff -- I think that's actually worse, because it makes it less clear what the function name is by looking at this source file. Also if for some reason we change LongType.simpleString in the future, these functions will subtly break. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14362: [SPARK-16730][SQL] Implement function aliases for type c...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14362 Closing this one in favor of https://github.com/apache/spark/pull/14364 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...
Github user petermaxlee closed the pull request at: https://github.com/apache/spark/pull/14362 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14375: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14375 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72379046 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -408,8 +409,21 @@ object FunctionRegistry { expression[BitwiseAnd]("&"), expression[BitwiseNot]("~"), expression[BitwiseOr]("|"), -expression[BitwiseXor]("^") - +expression[BitwiseXor]("^"), + +// Cast aliases (SPARK-16730) +castAlias("boolean", BooleanType), +castAlias("tinyint", ByteType), +castAlias("smallint", ShortType), +castAlias("int", IntegerType), +castAlias("bigint", LongType), --- End diff -- use `LongType.simpleString` instead of `bigint` looks better. Same to others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13248: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...
Github user praveendareddy21 commented on the issue: https://github.com/apache/spark/pull/13248 Reopened the pull request on master branch. https://github.com/apache/spark/pull/14375 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14375: [SPARK-15194] [ML] Add Python ML API for Multivar...
GitHub user praveendareddy21 opened a pull request: https://github.com/apache/spark/pull/14375 [SPARK-15194] [ML] Add Python ML API for MultivariateGaussian ## What changes were proposed in this pull request? Added Multivariate Gaussian and tests to match Scala's ML API. Ran pep8 and other doc changes. Reopening Pull request from 2.0 branch on admin's request. ## How was this patch tested? Unit tests : MultiVariateGaussianTests Also tested manually on local setup. You can merge this pull request into a Git repository by running: $ git pull https://github.com/praveendareddy21/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14375.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14375 commit aa5f64c1c7d84d88b1c972e5f18236af615bd89f Author: redDate: 2016-07-27T04:04:57Z added Multivariate Gaussian for ML API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62908/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62908/consoleFull)** for PR 14333 at commit [`7f042a2`](https://github.com/apache/spark/commit/7f042a2172166d0de413297351b4fe9b04168071). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14341: [Minor][Doc][SQL] Fix two documents regarding siz...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14341#discussion_r72378681 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -122,10 +124,10 @@ object SQLConf { val DEFAULT_SIZE_IN_BYTES = SQLConfigBuilder("spark.sql.defaultSizeInBytes") .internal() -.doc("The default table size used in query planning. By default, it is set to a larger " + - "value than `spark.sql.autoBroadcastJoinThreshold` to be more conservative. That is to say " + - "by default the optimizer will not choose to broadcast a table unless it knows for sure " + - "its size is small enough.") +.doc("The default table size used in query planning. By default, it is set to Long.MaxValue " + + "which is more than `spark.sql.autoBroadcastJoinThreshold` to be more conservative. " + --- End diff -- `which is larger than` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r72378623 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -252,6 +252,209 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + private def createDataSourceTable( + path: File, + userSpecifiedSchema: Option[String], + userSpecifiedPartitionCols: Option[String]): (StructType, Seq[String]) = { --- End diff -- how about we pass in the expected schema and partCols, and do the check in this method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72377491 --- Diff: R/pkg/R/mllib.R --- @@ -292,6 +299,43 @@ setMethod("summary", signature(object = "NaiveBayesModel"), return(list(apriori = apriori, tables = tables)) }) +#' Isotonic Regression Model +#' Fits an Isotonic Regression model against a Spark DataFrame, similarly to R's isoreg(). +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#' @param isotonic Whether the output sequence should be isotonic/increasing (true) or +#' antitonic/decreasing (false) +#' @param featureIndex The index of the feature if \code{featuresCol} is a vector column (default: `0`), +#' no effect otherwise +#' @return \code{spark.isotonicRegression} returns a fitted Isotonic Regression model +#' @rdname spark.isotonicRegression +#' @name spark.isotonicRegression +#' @export --- End diff -- Add ```@examples``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72377343 --- Diff: R/pkg/NAMESPACE --- @@ -24,7 +24,8 @@ exportMethods("glm", "spark.kmeans", "fitted", "spark.naiveBayes", - "spark.survreg") + "spark.survreg", + "spark.isotonicRegression") --- End diff -- Spark MLlib ```IsotonicRegression``` is more similar with R [```pava```](http://www.inside-r.org/packages/cran/Iso/docs/pava) ? Should it be named ```spark.pava``` better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72376639 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/IsotonicRegressionWrapper.scala --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.{Pipeline, PipelineModel} +import org.apache.spark.ml.attribute.{Attribute, AttributeGroup, NominalAttribute} +import org.apache.spark.ml.feature.RFormula +import org.apache.spark.ml.regression.{IsotonicRegression, IsotonicRegressionModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private [r] class IsotonicRegressionWrapper private ( +val pipeline: PipelineModel, +val labels: Array[String], +val features: Array[String]) extends MLWritable { + + private val isotonicRegressionModel: IsotonicRegressionModel = +pipeline.stages(1).asInstanceOf[IsotonicRegressionModel] + + lazy val boundaries: Array[Double] = isotonicRegressionModel.boundaries.toArray + + lazy val predictions: Array[Double] = isotonicRegressionModel.predictions.toArray + + def fitted(method: String): Array[Double] = { +if (method == "boundaries") { + boundaries +} else if (method == "predictions") { + predictions +} else { + throw new UnsupportedOperationException( +s"Method (boundaries or predictions) required but $method found.") +} + } + + def transform(dataset: Dataset[_]): DataFrame = { + pipeline.transform(dataset).drop(isotonicRegressionModel.getFeaturesCol) + } + + override def write: MLWriter = new IsotonicRegressionWrapper.IsotonicRegressionWrapperWriter(this) +} + +private[r] object IsotonicRegressionWrapper +extends MLReadable[IsotonicRegressionWrapper] { + + def fit( + data: DataFrame, + formula: String, + isotonic: Boolean, + featureIndex: Int): IsotonicRegressionWrapper = { + +val rFormulaModel = new RFormula() + .setFormula(formula) + .fit(data) + +// get feature names from output schema +val schema = rFormulaModel.transform(data).schema +val labelAttr = Attribute.fromStructField(schema(rFormulaModel.getLabelCol)) + .asInstanceOf[NominalAttribute] +val labels = labelAttr.values.get --- End diff -- Since ```IsotonicRegression``` is a regression model, so it's unnecessary to extract labels from column metadata (actually we did not save ```NominalAtrribute``` for regression model). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62908/consoleFull)** for PR 14333 at commit [`7f042a2`](https://github.com/apache/spark/commit/7f042a2172166d0de413297351b4fe9b04168071). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62907/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62907/consoleFull)** for PR 14333 at commit [`dc17da8`](https://github.com/apache/spark/commit/dc17da8eec232fcf2296deefb64222a6d07a0983). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72375060 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- Hi, @yhuai . Could you give me some advice? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14311: [SPARK-16550] [core] Certain classes fail to deserialize...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14311 Looks like the second jenkins run failed a slightly different set of the tests than the first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14311: [SPARK-16550] [core] Certain classes fail to deserialize...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14311 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14311: [SPARK-16550] [core] Certain classes fail to deserialize...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14311 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62906/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14311: [SPARK-16550] [core] Certain classes fail to deserialize...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14311 **[Test build #62906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62906/consoleFull)** for PR 14311 at commit [`7cccb39`](https://github.com/apache/spark/commit/7cccb39ec967df68427304a605dd52deade11573). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62907/consoleFull)** for PR 14333 at commit [`dc17da8`](https://github.com/apache/spark/commit/dc17da8eec232fcf2296deefb64222a6d07a0983). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Could anybody help review this PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/13824 That is fine too. I'd just do something like: ``` Seq("PYSPARK_DRIVER_PYTHON", "PYSPARK_PYTHON").foreach { envname => // code to set the value } ``` To avoid the repetition. BTW I just noticed you have a typo in the PR title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user KevinGrealish commented on the issue: https://github.com/apache/spark/pull/13824 How about just this: ` // propagate PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON to driver in cluster mode` ` if (!env.contains("PYSPARK_DRIVER_PYTHON")) {` ` sys.env.get("PYSPARK_DRIVER_PYTHON").foreach(env("PYSPARK_DRIVER_PYTHON") = _)` ` }` ` if (!env.contains("PYSPARK_PYTHON")) {` `sys.env.get("PYSPARK_PYTHON").foreach(env("PYSPARK_PYTHON") = _)` ` }` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14296: [SPARK-16639][SQL] The query with having condition that ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14296 ping @cloud-fan Any more comments? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen I check `RDD.persist` referenced place: AFTSuvivalRegression, LinearRegression, LogisticRegression, will persist input training RDD and unpersist them when `train` return, seems OK. `recommend.ALS` persist many RDDs and seems unpersist them all OK. mllib `BisectingKMeans.run` contains a TODO "unpersist old indices", I'll check it now. Others seems OK. `Broadcast.persist` referenced place already checked in this PR I think they are all properly handled here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14341: [Minor][Doc][SQL] Fix two documents regarding size in by...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14341 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14270 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14270 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62904/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14270 **[Test build #62904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62904/consoleFull)** for PR 14270 at commit [`8923c58`](https://github.com/apache/spark/commit/8923c58d324b8083ffb423d165f4707ec4395db2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14270 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62903/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14270 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14340 So I would like to -1 this patch. I think it's been a mistake to support dstream in Python -- yes it satisfies a checkbox and Spark could claim there's support for streaming in Python. However, the tooling and maturity for working with streaming data (both in Spark and the more broad ecosystem) is simply not there. It is a big baggage to maintain, and creates a the wrong impression that production streaming jobs can be written in Python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14270 **[Test build #62903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62903/consoleFull)** for PR 14270 at commit [`b9c9a7a`](https://github.com/apache/spark/commit/b9c9a7aa2b831247ae04d655f537223a02bc8440). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r72368797 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -552,7 +552,12 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( taskId: String, reason: String): Unit = { stateLock.synchronized { - removeExecutor(taskId, SlaveLost(reason)) + // Do not call removeExecutor() after this scheduler backend was stopped because --- End diff -- what about submitting another JIRA issue on better handling of state management after stop() is called for CoarseGrainedSchedulerBackend? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14175 Sure, will add it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/13824 I think there are two ways to solve the problem that might be a little better... The first is to try to keep the current behavior. If, in L734 (where you're removing the current code that checks the `appMasterEnv` conf), you make a copy of the keys that were added to the `env` variable, you can then apply `addPathToEnvironment` just to the keys that are there. That means that the code will merge any user configuration with env variables created by Spark itself; otherwise it will use the user's override. The second is to read certain env variables using a special method that first looks at `spark.yarn.appMasterEnv.FOO` and if it doesn't exist, `sys.env("FOO")`. Then you could modify the code that currently reads `PYSPARK_DRIVER_PYTHON` and friends using that new method, instead of directly peeking at `sys.env`. You could also apply that new method to the code that currently reads `PYTHONPATH`. I think the latter is a better solution than you currently have, since it avoids hardcoding these env variable names in more places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14311: [SPARK-16550] [core] Certain classes fail to deserialize...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14311 Ok weird, I can't reproduce locally any more, even after many tries. I wonder if it's just very rarely flaky, though that seems unlikely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14311: [SPARK-16550] [core] Certain classes fail to deserialize...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14311 **[Test build #62906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62906/consoleFull)** for PR 14311 at commit [`7cccb39`](https://github.com/apache/spark/commit/7cccb39ec967df68427304a605dd52deade11573). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14349: [SPARK-16524][SQL] Add RowBatch and RowBasedHashM...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14349 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14065 I'm having trouble finding the bandwidth to look at the updated patch, but it's on my list... there were some replies to my comments that I want to take a closer look at. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14349: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14349 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14349: [SPARK-16524][SQL] Add RowBatch and RowBasedHashM...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14349#discussion_r72365970 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.expressions; + +import org.apache.spark.memory.TaskMemoryManager; +import org.apache.spark.sql.types.*; +import org.apache.spark.unsafe.Platform; + +/** + * An implementation of `RowBasedKeyValueBatch` in which key-value records have variable lengths. + * + * The format for each record looks like this: + * [4 bytes total size = (klen + vlen + 4)] [4 bytes key size = klen] + * [UnsafeRow for key of length klen] [UnsafeRow for Value of length vlen] + * [8 bytes pointer to next] + * Thus, record length = 4 + 4 + klen + vlen + 8 + */ +public final class VariableLengthRowBasedKeyValueBatch extends RowBasedKeyValueBatch { --- End diff -- you can write the test suites in scala -- it tends to simplify the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14364: [SPARK-16730][SQL] Implement function aliases for type c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14364 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14364: [SPARK-16730][SQL] Implement function aliases for type c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14364 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62902/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14364: [SPARK-16730][SQL] Implement function aliases for type c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14364 **[Test build #62902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62902/consoleFull)** for PR 14364 at commit [`b8fbcab`](https://github.com/apache/spark/commit/b8fbcab1d5bf78f10b0edce2a1011080a38f4fc6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13152: [SPARK-15353] [CORE] Making peer selection for block rep...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/13152 > The topology info is only queried when the executor initiates and is assumed to stay the same throughout the life of the executor. Depending on the cluster manager being used, I am assuming the exact way this information is provided may differ. Resolving this at the master makes this implementation simpler as only the master needs to be able to access the service/script/class being used to resolve the topology. The communication overhead is minimal as the executors do have to communicate with the master when they initiate anyways. I see, that makes sense, though it is a little weird to ask the master for info that you use to register right away later. > The getRandomPeer() method was doing quite a bit more than just getting a random peer. It was being used to manage/mutate state, which was being mutated in other places as well. I tried to keep the block placement strategy and the usage of its output separate, to make it simpler to provide a new block placement strategy. I also thought it would be best to de-couple any internal replication state management with the block replication strategy, while still keeping the structure of the state the same. Still, I think it would be a smaller change to just move some of that logic out of getRandomPeer(), and retain the rest. Then you just need to implement getNextPeer(), and BlockManager doesn't need to worry about tracking the prioritized order internally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14226: [SPARK-16580][CORE] class Accumulator in package spark i...
Github user keypointt commented on the issue: https://github.com/apache/spark/pull/14226 @srowen sorry, I had a very busy week passed, now I have time to look into it. Will keep you posted :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...
Github user zasdfgbnm commented on a diff in the pull request: https://github.com/apache/spark/pull/14231#discussion_r72364403 --- Diff: bin/spark-class --- @@ -65,24 +65,25 @@ fi # characters that would be otherwise interpreted by the shell. Read that in a while loop, populating # an array that will be used to exec the final command. # -# The exit code of the launcher is appended to the output, so the parent shell removes it from the -# command array and checks the value to see if the launcher succeeded. -build_command() { - "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" - printf "%d\0" $? -} +# To keep both the output and the exit code of the launcher, the output is first converted to a hex +# dump which prevents the bash from getting rid of the NULL character, and the exit code retrieved +# from the bash array ${PIPESTATUS[@]}. +# +# Note that the seperator NULL character can not be replace with space or '\n' so that the command +# won't fail if some path of the user contain special characher such as '\n' or space +# +# Also note that when the launcher fails, it might not output something ending with '\0' [SPARK-16586] +_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]}) --- End diff -- The launcher doesn't actually launch anything, instead, it just output a command that should be used to launch the desired class, separated by .`\0`` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...
Github user zasdfgbnm commented on a diff in the pull request: https://github.com/apache/spark/pull/14231#discussion_r72364159 --- Diff: bin/spark-class --- @@ -65,24 +65,25 @@ fi # characters that would be otherwise interpreted by the shell. Read that in a while loop, populating # an array that will be used to exec the final command. # -# The exit code of the launcher is appended to the output, so the parent shell removes it from the -# command array and checks the value to see if the launcher succeeded. -build_command() { - "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" - printf "%d\0" $? -} +# To keep both the output and the exit code of the launcher, the output is first converted to a hex +# dump which prevents the bash from getting rid of the NULL character, and the exit code retrieved +# from the bash array ${PIPESTATUS[@]}. +# +# Note that the seperator NULL character can not be replace with space or '\n' so that the command +# won't fail if some path of the user contain special characher such as '\n' or space +# +# Also note that when the launcher fails, it might not output something ending with '\0' [SPARK-16586] +_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]}) --- End diff -- If the launcher fails, it is sufficient to terminate the script and exit with the nonzero `$?`. But if it success, the the output, which contains `\0`, should be used to start a new command. That's why when we execute `"$RUNNER" -Xmx128m ...`, we should try to store both the exit code and the output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13824 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62905/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13824 **[Test build #62905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62905/consoleFull)** for PR 13824 at commit [`f2c2e4a`](https://github.com/apache/spark/commit/f2c2e4a82ed44d367db67f9382024a619b688104). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13824 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14124 @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14172 (cc @liancheng) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 gentle ping @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14102 @yhuai I addressed the comments! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13988 @rxin Could you take a look please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13824 **[Test build #62905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62905/consoleFull)** for PR 13824 at commit [`f2c2e4a`](https://github.com/apache/spark/commit/f2c2e4a82ed44d367db67f9382024a619b688104). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13824: [SPARK-16110][YARN][PYSPARK] Fix allowing python version...
Github user KevinGrealish commented on the issue: https://github.com/apache/spark/pull/13824 Created https://issues.apache.org/jira/browse/SPARK-16744 for the override/append issue. linked to 16110. This fix remains just about being able to run Python 3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14374: [SPARK-16735][SQL] `map` should create a decimal key or ...
Github user biglobster commented on the issue: https://github.com/apache/spark/pull/14374 @dongjoon-hyun thank you, and I have just update the title of this pull request with the jira_id > SPARK-16735 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14270 **[Test build #62904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62904/consoleFull)** for PR 14270 at commit [`8923c58`](https://github.com/apache/spark/commit/8923c58d324b8083ffb423d165f4707ec4395db2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62901/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14373 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14373 **[Test build #62901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62901/consoleFull)** for PR 14373 at commit [`a30ca9f`](https://github.com/apache/spark/commit/a30ca9f4cfde295a811cbe144d6cf165be1227c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14370: [SPARK-16713][SQL] Check codegen method size ≤ 8K on c...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14370 @davies would you also take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14270 **[Test build #62903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62903/consoleFull)** for PR 14270 at commit [`b9c9a7a`](https://github.com/apache/spark/commit/b9c9a7aa2b831247ae04d655f537223a02bc8440). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...
Github user markgrover commented on the issue: https://github.com/apache/spark/pull/14270 Fixed the nits, resolved the merge conflict. Shortened some test names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14270: [SPARK-5847][CORE] Allow for configuring MetricsS...
Github user markgrover commented on a diff in the pull request: https://github.com/apache/spark/pull/14270#discussion_r72357797 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala --- @@ -125,19 +126,26 @@ private[spark] class MetricsSystem private ( * application, executor/driver and metric source. */ private[spark] def buildRegistryName(source: Source): String = { -val appId = conf.getOption("spark.app.id") +val metricsNamespace = conf.get(METRICS_NAMESPACE).map(Some(_)) --- End diff -- Good point, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14270: [SPARK-5847][CORE] Allow for configuring MetricsS...
Github user markgrover commented on a diff in the pull request: https://github.com/apache/spark/pull/14270#discussion_r72357828 --- Diff: core/src/test/scala/org/apache/spark/metrics/MetricsSystemSuite.scala --- @@ -183,4 +184,89 @@ class MetricsSystemSuite extends SparkFunSuite with BeforeAndAfter with PrivateM assert(metricName != s"$appId.$executorId.${source.sourceName}") assert(metricName === source.sourceName) } + + test("MetricsSystem with Executor instance, with custom namespace") { +val source = new Source { + override val sourceName = "dummySource" + override val metricRegistry = new MetricRegistry() +} + +val appId = "testId" +val appName = "testName" +val executorId = "1" +conf.set("spark.app.id", appId) +conf.set("spark.app.name", appName) +conf.set("spark.executor.id", executorId) +conf.set(METRICS_NAMESPACE, "${spark.app.name}") + +val instanceName = "executor" +val driverMetricsSystem = MetricsSystem.createMetricsSystem(instanceName, conf, securityMgr) + +val metricName = driverMetricsSystem.buildRegistryName(source) +assert(metricName === s"$appName.$executorId.${source.sourceName}") + } + + test("MetricsSystem with Executor instance and custom namespace which is not set") { +val source = new Source { + override val sourceName = "dummySource" + override val metricRegistry = new MetricRegistry() +} + +val executorId = "1" +val namespaceToResolve = "${spark.doesnotexist}" +conf.set("spark.executor.id", executorId) +conf.set(METRICS_NAMESPACE, namespaceToResolve) + +val instanceName = "executor" +val driverMetricsSystem = MetricsSystem.createMetricsSystem(instanceName, conf, securityMgr) + +val metricName = driverMetricsSystem.buildRegistryName(source) +// If the user set the spark.metrics.namespace property to an expansion of another property +// (say ${spark.doesnotexist}, the unresolved name (i.e. litterally ${spark.doesnot}) --- End diff -- Appreciate your thoroughness! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org