[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala
Github user BoleynSu commented on a diff in the pull request: https://github.com/apache/spark/pull/18836#discussion_r131316095 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -82,7 +82,7 @@ case class SortMergeJoinExec( override def outputOrdering: Seq[SortOrder] = joinType match { // For inner join, orders of both sides keys should be kept. -case Inner => +case _: InnerLike => --- End diff -- I think we can get a SortMergeJoin paln with Cross, e.g. `select distinct a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i having a.i > 2`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131315665 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -157,12 +168,8 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Execute -i init files (always in silent mode) cli.processInitFiles(sessionState) -// Respect the configurations set by --hiveconf from the command line -// (based on Hive's CliDriver). -val it = sessionState.getOverriddenConfigurations.entrySet().iterator() -while (it.hasNext) { - val kv = it.next() - SparkSQLEnv.sqlContext.setConf(kv.getKey, kv.getValue) +newHiveConf.foreach{ kv => --- End diff -- `foreach{` -> `foreach {` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131314418 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- @yaooqinn Please follow what @tejasapatil said and create a util function. In addition, `newTemporaryConfiguration` is being used for `SparkSQLCLIDriver`, and thus, please update the function description of `newTemporaryConfiguration`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18820 > I don't think we should allow user to change field nullability while doing replace. Why not? As long as we correctly update the schema from non-nullable to nullable, it seems OK to me. What would we be protecting against by disallowing this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18820 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80229/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18820 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...
Github user raajay commented on the issue: https://github.com/apache/spark/pull/18690 I understand. My previous comment was just a clarification to your question: "I'm not sure how does this code work in your changes?". I will close this PR. The JIRA is already closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18820 **[Test build #80229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80229/testReport)** for PR 18820 at commit [`dfbcaf3`](https://github.com/apache/spark/commit/dfbcaf3e47126eaf6fd3a0276054cad01dbff71a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18690: [SPARK-21334][CORE] Add metrics reporting service...
Github user raajay closed the pull request at: https://github.com/apache/spark/pull/18690 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18742 **[Test build #80234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80234/testReport)** for PR 18742 at commit [`8a3c6d6`](https://github.com/apache/spark/commit/8a3c6d6fdceeb9f11454a96d9302bc17ca1cf1e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18831#discussion_r131313635 --- Diff: R/pkg/tests/fulltests/test_mllib_regression.R --- @@ -173,6 +173,14 @@ test_that("spark.glm summary", { expect_equal(stats$df.residual, rStats$df.residual) expect_equal(stats$aic, rStats$aic) + # Test spark.glm works with offset + training <- suppressWarnings(createDataFrame(iris)) + stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, + family = poisson(), offsetCol = "Petal_Length")) + rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + Species, +data = iris, family = poisson(), offset = iris$Petal.Length))) --- End diff -- that's interesting - perhaps we should take `col` in addition to `col name` too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18829 I understand what you mean. These metricsURLs do not need to be displayed in the WEB UI. Important metrics can be used as a header for the WEB UI, such as 'aliveWorkers' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/18829 I'm not personally saying these metrics need to be in the Web UI, I'm just saying that if you think they're important enough to surface this way then they should be important enough to you to have them displayed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18829 Your comments I accepted, thank you. If you really make these important metrics to WEB UI, the workload is not small. I will try to do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131311220 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal +select 4, b from data group by 1, 2; + +set spark.sql.groupByOrdinal=true; + +select 4, b from data group by 1, 2; + +select 3, 4, sum(b) from data group by 1, 2; --- End diff -- We can move those test queries to the test suite like `DataFrameAggregateSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131311047 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal +select 4, b from data group by 1, 2; + +set spark.sql.groupByOrdinal=true; + +select 4, b from data group by 1, 2; + +select 3, 4, sum(b) from data group by 1, 2; --- End diff -- Only the methods like `Dataset.show` causes en-entrance of analyzed plans. `collect` won't. I guess the queries here are evaluated with `collect`. scala> sql("select 4, b, sum(b) from data group by 1, 2").show org.apache.spark.sql.AnalysisException: GROUP BY position 4 is not in select list (valid range is [1, 3]); line 1 pos 7 scala> sql("select 4, b, sum(b) from data group by 1, 2").collect res2: Array[org.apache.spark.sql.Row] = Array([4,3,3], [4,4,4], [4,2,2]) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18813: [SPARK-21567][SQL] Dataset should work with type alias
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18813 ping @cloud-fan May you have time to look at this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131309985 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal +select 4, b from data group by 1, 2; + +set spark.sql.groupByOrdinal=true; + +select 4, b from data group by 1, 2; + +select 3, 4, sum(b) from data group by 1, 2; --- End diff -- Can those tests this bug? No matter I use `transform` or `resolveOperators`, they generate the same results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18840: [SPARK-21565] Propagate metadata in attribute replacemen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18840 **[Test build #80233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80233/testReport)** for PR 18840 at commit [`e54d812`](https://github.com/apache/spark/commit/e54d81200569c2260f0995b2f91aa9829dc10ad7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18840: [SPARK-21565] Propagate metadata in attribute rep...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18840#discussion_r131309601 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala --- @@ -391,6 +391,30 @@ class EventTimeWatermarkSuite extends StreamTest with BeforeAndAfter with Matche checkDataset[Long](df, 1L to 100L: _*) } + test("SPARK-21565: watermark operator accepts attributes from replacement") { +withTempDir { dir => + dir.delete() + + val df = Seq(("a", 100.0, new java.sql.Timestamp(100L))) +.toDF("symbol", "price", "eventTime") + df.write.json(dir.getCanonicalPath) + + val input = spark.readStream.schema(df.schema) +.json(dir.getCanonicalPath) + + val groupEvents = input +.withWatermark("eventTime", "2 seconds") +.groupBy("symbol", "eventTime") +.agg(count("price") as 'count) +.select("symbol", "eventTime", "count") + val q = groupEvents.writeStream +.outputMode("append") +.format("console") +.start() + q.processAllAvailable() --- End diff -- nit: `q.processAllAvailable()` -> ``` try { q.processAllAvailable() } finally { q.stop() } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131309398 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal +select 4, b from data group by 1, 2; + +set spark.sql.groupByOrdinal=true; + +select 4, b from data group by 1, 2; + +select 3, 4, sum(b) from data group by 1, 2; --- End diff -- Can those tests this bug? No matter I use `transform` or `resolveOperators`, they generate the same results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18840: [SPARK-21565] Propagate metadata in attribute replacemen...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18840 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131309163 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal --- End diff -- OK,thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131309076 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal +select 4, b from data group by 1, 2; + +set spark.sql.groupByOrdinal=true; + +select 4, b from data group by 1, 2; + +select 3, 4, sum(b) from data group by 1, 2; --- End diff -- Also, it fixes this query:`select 3 as c, 4 as d, sum(b) from data group by c, d;` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Test failures in pyspark.ml.tests with python2.6, but I don't have the environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18742 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80230/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18742 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18742 **[Test build #80230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80230/testReport)** for PR 18742 at commit [`cd22c59`](https://github.com/apache/spark/commit/cd22c59ed6e9b11ca9690f11a20febda02e9c896). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r131308903 --- Diff: python/pyspark/sql/tests.py --- @@ -3036,6 +3052,9 @@ def test_toPandas_arrow_toggle(self): pdf = df.toPandas() self.spark.conf.set("spark.sql.execution.arrow.enable", "true") pdf_arrow = df.toPandas() +# need to remove timezone for comparison +pdf_arrow["7_timestamp_t"] = \ +pdf_arrow["7_timestamp_t"].apply(lambda ts: ts.tz_localize(None)) --- End diff -- I talked to @gatorsmile and he suggested that we should have another configuration to control the behavior of `df.toPandas()` to handle timezone or not, the default behavior of which is to not handle it as the same as the current behavior. I'll submit a pr to do it with "without-Arrow" version asap and I'd like you to follow the behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131308502 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal --- End diff -- Could you move this line to the line before the line 68 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131308421 --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql --- @@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 having a > 0; -- mixed cases: group-by ordinals and aliases select a, a AS k, count(b) from data group by k, 1; --- turn of group by ordinal +-- turn off group by ordinal set spark.sql.groupByOrdinal=false; -- can now group by negative literal select sum(b) from data group by -1; + +-- SPARK-21580 ints in aggregation expressions are taken as group-by ordinal +select 4, b from data group by 1, 2; + +set spark.sql.groupByOrdinal=true; + +select 4, b from data group by 1, 2; + +select 3, 4, sum(b) from data group by 1, 2; --- End diff -- Except this line, does this PR fix any other queries? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18668 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18668 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18668 **[Test build #80227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80227/testReport)** for PR 18668 at commit [`5043eb6`](https://github.com/apache/spark/commit/5043eb69b41d1d0263e8814da27a934491bc936c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18840: [SPARK-21565] Propagate metadata in attribute replacemen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18840 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18840: [SPARK-21565] Propagate metadata in attribute rep...
GitHub user joseph-torres opened a pull request: https://github.com/apache/spark/pull/18840 [SPARK-21565] Propagate metadata in attribute replacement. ## What changes were proposed in this pull request? Propagate metadata in attribute replacement during streaming execution. This is necessary for EventTimeWatermarks consuming replaced attributes. ## How was this patch tested? new unit test, which was verified to fail before the fix You can merge this pull request into a Git repository by running: $ git pull https://github.com/joseph-torres/spark SPARK-21565 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18840.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18840 commit e54d81200569c2260f0995b2f91aa9829dc10ad7 Author: Jose TorresDate: 2017-08-04T03:52:57Z Propagate metadata in attribute replacement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18779 **[Test build #80232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80232/testReport)** for PR 18779 at commit [`bfab7e5`](https://github.com/apache/spark/commit/bfab7e55609ed9c98534c23d41ff881a6eed84d5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18779 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/18779 ok,thanks @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18839 **[Test build #80231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80231/testReport)** for PR 18839 at commit [`23efcc6`](https://github.com/apache/spark/commit/23efcc687aabf8bebcf3d1a2647b60ca059f9d55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18839 Some test on string form of the plan might fail. Let's see ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18839: [SPARK-21634][SQL] Change OneRowRelation from a c...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18839 [SPARK-21634][SQL] Change OneRowRelation from a case object to case class ## What changes were proposed in this pull request? OneRowRelation is the only plan that is a case object, which causes some issues with makeCopy using a 0-arg constructor. This patch changes it from a case object to a case class. This blocks SPARK-21619. ## How was this patch tested? Should be covered by existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-21634 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18839.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18839 commit 23efcc687aabf8bebcf3d1a2647b60ca059f9d55 Author: Reynold XinDate: 2017-08-04T03:39:34Z [SPARK-21634][SQL] Change OneRowRelation from a case object to case class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18779 @10110346 As using `resolveOperators` can solve the whole bug, let's do it and simplify the whole change. Sorry for confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131305760 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -1,54 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, SortOrder} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Sort} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin -import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.IntegerType - -/** - * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. - */ -class SubstituteUnresolvedOrdinals(conf: SQLConf) extends Rule[LogicalPlan] { - private def isIntLiteral(e: Expression) = e match { -case Literal(_, IntegerType) => true -case _ => false - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { --- End diff -- Aha, sorry, I made a mistake here in inspecting Analyzer... Yeah, the whole bug is due to re-entrance of an analyzed plan. Currently we can solve it by using `resolveOperators`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/18829 I think if we really want these metrics in the UI we should look at adding them to the UI in some way rather as a link to a json dump. I am not a fan of json dumps as part of a UI in general, I think it defeats the purpose of creating a UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18742 **[Test build #80230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80230/testReport)** for PR 18742 at commit [`cd22c59`](https://github.com/apache/spark/commit/cd22c59ed6e9b11ca9690f11a20febda02e9c896). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18742 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80226/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18742 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18742 **[Test build #80226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80226/testReport)** for PR 18742 at commit [`bb3d998`](https://github.com/apache/spark/commit/bb3d9983bd9ee6cc758950aa84a530184db7b68d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18820 **[Test build #80229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80229/testReport)** for PR 18820 at commit [`dfbcaf3`](https://github.com/apache/spark/commit/dfbcaf3e47126eaf6fd3a0276054cad01dbff71a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18820 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80228/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18829 Hbase WEB UI has metrics, Spark WEB UI should also have the function. This is just my opinion. ![20](https://user-images.githubusercontent.com/26266482/28951812-e27abd9c-78ff-11e7-8213-c57d74716806.png) ![21](https://user-images.githubusercontent.com/26266482/28951819-e8edd394-78ff-11e7-99f9-d8e93873470a.png) ![23](https://user-images.githubusercontent.com/26266482/28951824-f0cfdbd4-78ff-11e7-879e-5667e051f0fe.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18746 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18746 **[Test build #80228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80228/testReport)** for PR 18746 at commit [`a30ae39`](https://github.com/apache/spark/commit/a30ae399923f2d4cef2cc00d88cae85fa5a2dde7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18779 We need to backport this issue to branch-2.2? I think the opinion depends on the backport decision. If no, I'm with your suggestion (keep this issue as a blocker for branch-2.3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17980 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17980 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80224/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17980 **[Test build #80224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80224/testReport)** for PR 17980 at commit [`5db6acf`](https://github.com/apache/spark/commit/5db6acf2cde02217c283103ebcbfd2630338852c). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18832 Thanks @sethah . I strongly think we should update the commend or just delete the comment as the current PR. Another reason is: there are three kinds of feature: categorical, ordered categorical, and continuous Only the first iteration of categorical feature need parentStats, the other two don't need. The comment seems all first iteration need parentStats. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131300093 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -1,54 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, SortOrder} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Sort} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin -import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.IntegerType - -/** - * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. - */ -class SubstituteUnresolvedOrdinals(conf: SQLConf) extends Rule[LogicalPlan] { - private def isIntLiteral(e: Expression) = e match { -case Literal(_, IntegerType) => true -case _ => false - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { --- End diff -- Run `SubstituteUnresolvedOrdinals` with ` fixedPoint` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18395 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80222/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18395 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18746 **[Test build #80228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80228/testReport)** for PR 18746 at commit [`a30ae39`](https://github.com/apache/spark/commit/a30ae399923f2d4cef2cc00d88cae85fa5a2dde7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18395 **[Test build #80222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80222/testReport)** for PR 18395 at commit [`76cb291`](https://github.com/apache/spark/commit/76cb291d21ead292e5dda2d20a206492e23a3120). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80221/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131299271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -1,54 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, SortOrder} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Sort} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin -import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.IntegerType - -/** - * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. - */ -class SubstituteUnresolvedOrdinals(conf: SQLConf) extends Rule[LogicalPlan] { - private def isIntLiteral(e: Expression) = e match { -case Literal(_, IntegerType) => true -case _ => false - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { --- End diff -- @10110346 May I ask which once you run `SubstituteUnresolvedOrdinals` with? `Once` or `fixedPoint`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18815 Ok. That master, worker log and Executor log can be displayed in the WEB UI? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131298714 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -1,54 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, SortOrder} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Sort} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin -import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.IntegerType - -/** - * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. - */ -class SubstituteUnresolvedOrdinals(conf: SQLConf) extends Rule[LogicalPlan] { - private def isIntLiteral(e: Expression) = e match { -case Literal(_, IntegerType) => true -case _ => false - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { --- End diff -- For this Change: `transform -> resolveOperators`,it can fix the whole bug, I have tested it. @gatorsmile @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18668 **[Test build #80227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80227/testReport)** for PR 18668 at commit [`5043eb6`](https://github.com/apache/spark/commit/5043eb69b41d1d0263e8814da27a934491bc936c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18690 So I think if you want to connect your custom sink to Spark Metrics System, then you should at least follow what Spark and codahale metrics library did. Adding a feature in Spark specifically works only for your own sink seems not so reasonable, unless it is a general requirement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18742 **[Test build #80226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80226/testReport)** for PR 18742 at commit [`bb3d998`](https://github.com/apache/spark/commit/bb3d9983bd9ee6cc758950aa84a530184db7b68d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/18815 Ok then I'm really confused, if the logs we're talking about can already be viewed in the ui why do we need to display their location on the system? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18460 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80223/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18838: [SPARK-21632] There is no need to make attempts for crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18838 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18460 **[Test build #80223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80223/testReport)** for PR 18460 at commit [`3e1e81d`](https://github.com/apache/spark/commit/3e1e81d4412672cd479cd07aa786dea6328789aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18838: [SPARK-21632] There is no need to make attempts f...
GitHub user liu-zhaokun opened a pull request: https://github.com/apache/spark/pull/18838 [SPARK-21632] There is no need to make attempts for createDirectory if the dir had existed [https://issues.apache.org/jira/browse/SPARK-21632](https://issues.apache.org/jira/browse/SPARK-21632) There is no need to make attempts for createDirectory if the dir had existed.So I think we should log it,and Jump out of the loop You can merge this pull request into a Git repository by running: $ git pull https://github.com/liu-zhaokun/spark master08040833 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18838.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18838 commit e3e36a1b920890f2e1873790775d4327cf9ee4fa Author: liuzhaokunDate: 2017-08-04T01:39:51Z [SPARK-21632] There is no need to make attempts for createDirectory if the dir had existed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18829 I tend to agree with @ajbozarth , since we already have the APIs to access metrics dump with json format, this looks like not so necessary. Also directly displaying such kind of json dump on the UI without formatting seems hard to use or check. AFAIK usually user will feed this to a monitoring system to display, rather than directly check on the web UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18779 Well, maybe we should revisit this after #17770 gets merged. Because after that, we won't go through analyzed plans anymore. At that time, we can simply solve all the issues by making `SubstituteUnresolvedOrdinals` run with `Once`. What do you think @gatorsmile @10110346 @maropu? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131294567 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -1,54 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, SortOrder} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Sort} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin -import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.IntegerType - -/** - * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. - */ -class SubstituteUnresolvedOrdinals(conf: SQLConf) extends Rule[LogicalPlan] { - private def isIntLiteral(e: Expression) = e match { -case Literal(_, IntegerType) => true -case _ => false - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { --- End diff -- Because it keeps the redundant Aliases at the top level. That may not what we want. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18815 Why is the Executor log can be displayed in the WEB UI? Similarly, I think master, worker log and Executor log is the same. They can be displayed in the WEB UI. ![image](https://user-images.githubusercontent.com/26266482/28950185-5fada808-78f4-11e7-9b42-e84dbed1f809.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18746: [ML][Python] UnaryTransformer in Python
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18746#discussion_r131293810 --- Diff: python/pyspark/ml/tests.py --- @@ -1957,6 +1988,40 @@ def test_chisquaretest(self): self.assertTrue(all(field in fieldNames for field in expectedFields)) +class UnaryTransformerTests(SparkSessionTestCase): + +def test_unary_transformer_validate_input_type(self): +shiftVal = 3 +transformer = MockUnaryTransformer(shiftVal=shiftVal)\ +.setInputCol("input").setOutputCol("output") + +# should not raise any errors +transformer.validateInputType(DoubleType()) + +with self.assertRaises(TypeError): +# passing the wrong input type should raise an error +transformer.validateInputType(IntegerType()) + +def test_unary_transformer_transform(self): +shiftVal = 3 +transformer = MockUnaryTransformer(shiftVal=shiftVal)\ +.setInputCol("input").setOutputCol("output") + +df = self.spark.range(0, 10).toDF('input') +df = df.withColumn("input", df.input.cast(dataType="double")) + +transformed_df = transformer.transform(df) +inputCol = transformed_df.select("input").collect() --- End diff -- Do this instead: ``` results = transformed_df.select("input", "output").collect() for res in results: self.assertEqual(res.input + shiftVal, res.output) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18829 Json Metrics information is very full, a lot of information UI is currently unable to show, but this information for the application developers is also very important. I understand that for the spark application developers, they observe the spark cluster information in three ways, WEB UI, LOG and mestrics. But we used to look at the masterIP: 8080 UI to view the information, instead of using api and log on to the server to view the log information. Api is not very easy to remember. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18746: [ML][Python] UnaryTransformer in Python
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18746 @ajaysaini725Â Is there a JIRA for this PR? Please tag this PR in the title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/18779#discussion_r131292299 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -1,54 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, SortOrder} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Sort} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin -import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.IntegerType - -/** - * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. - */ -class SubstituteUnresolvedOrdinals(conf: SQLConf) extends Rule[LogicalPlan] { - private def isIntLiteral(e: Expression) = e match { -case Literal(_, IntegerType) => true -case _ => false - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { --- End diff -- changing `Aggregate(grouping.map(trimAliases), cleanedAggs, child)` to ` Aggregate(grouping.map(trimNonTopLevelAliases), cleanedAggs, child) in `CleanupAliases` , looks like it can fix the whole bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131284503 --- Diff: python/pyspark/ml/tests.py --- @@ -1957,6 +1964,46 @@ def test_chisquaretest(self): self.assertTrue(all(field in fieldNames for field in expectedFields)) +class DefaultReadWriteTests(SparkSessionTestCase): + +def test_default_read_write(self): +temp_path = tempfile.mkdtemp() + +lr = LogisticRegression() +lr.setMaxIter(50) +lr.setThreshold(.75) +writer = DefaultParamsWriter(lr) + +savePath = temp_path + "/lr" +writer.saveImpl(savePath) + +reader = DefaultParamsReadable.read() +lr2 = reader.load(savePath) + +self.assertEqual(lr.uid, lr2.uid) +self.assertEqual(lr.extractParamMap(), lr2.extractParamMap()) + +def test_default_read_write_with_overwrite(self): --- End diff -- Since these tests are almost the same, can you combine them to reduce code duplication? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131285744 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -471,3 +471,24 @@ private[ml] object MetaAlgorithmReadWrite { List((instance.uid, instance)) ++ subStageMaps } } + +private[ml] class FileSystemOverwrite extends Logging { + + def handleOverwrite(path: String, shouldOverwrite: Boolean, sc: SparkContext): Unit = { --- End diff -- This is the same now as the code in MLWriter.save, so please use this within MLWriter.save to eliminate the duplicated code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131286314 --- Diff: python/pyspark/ml/util.py --- @@ -61,20 +66,74 @@ def _randomUID(cls): @inherit_doc -class MLWriter(object): +class BaseReadWrite(object): +""" +Base class for MLWriter and MLReader. Stores information about the SparkContext +and SparkSession. + +.. versionadded:: 2.3.0 +""" + +def __init__(self): +self._sparkSession = None + +def context(self, sqlContext): --- End diff -- Leaving this is OK if you remove it from the subclasses, for the sake of code simplification. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131287820 --- Diff: python/pyspark/ml/util.py --- @@ -237,6 +300,13 @@ def _load_java_obj(cls, clazz): java_obj = getattr(java_obj, name) return java_obj +@classmethod +def _load_given_name(cls, java_class): --- End diff -- This can be static since you don't use ```cls``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131288629 --- Diff: python/pyspark/ml/util.py --- @@ -237,6 +300,13 @@ def _load_java_obj(cls, clazz): java_obj = getattr(java_obj, name) return java_obj +@classmethod +def _load_given_name(cls, java_class): --- End diff -- Please also stick with camelCase for methods since that's what pyspark does. (here and elsewhere) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131288896 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +353,143 @@ def numFeatures(self): Returns the number of features the model was trained on. If unknown, returns -1 """ return self._call_java("numFeatures") + + +@inherit_doc +class DefaultParamsWritable(MLWritable): +""" +Class for making simple Params types writable. Assumes that all parameters +are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def write(self): +"""Returns a DefaultParamsWriter instance for this class.""" +if isinstance(self, Params): +return DefaultParamsWriter(self) +else: +raise TypeError("Cannot use DefautParamsWritable with type %s because it does not " + +" extend Params.", type(self)) + + +@inherit_doc +class DefaultParamsWriter(MLWriter): +""" +Class for writing Estimators and Transformers whose parameters are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def __init__(self, instance): +super(DefaultParamsWriter, self).__init__() +self.instance = instance + +def saveImpl(self, path): +DefaultParamsWriter.save_metadata(self.instance, path, self.sc) + +@staticmethod +def save_metadata(instance, path, sc, extraMetadata=None, paramMap=None): --- End diff -- add leading underscore to make this private --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131288360 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +353,143 @@ def numFeatures(self): Returns the number of features the model was trained on. If unknown, returns -1 """ return self._call_java("numFeatures") + + +@inherit_doc +class DefaultParamsWritable(MLWritable): +""" +Class for making simple Params types writable. Assumes that all parameters +are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def write(self): +"""Returns a DefaultParamsWriter instance for this class.""" +if isinstance(self, Params): +return DefaultParamsWriter(self) +else: +raise TypeError("Cannot use DefautParamsWritable with type %s because it does not " + +" extend Params.", type(self)) + + +@inherit_doc +class DefaultParamsWriter(MLWriter): +""" +Class for writing Estimators and Transformers whose parameters are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def __init__(self, instance): +super(DefaultParamsWriter, self).__init__() +self.instance = instance + +def saveImpl(self, path): +DefaultParamsWriter.save_metadata(self.instance, path, self.sc) + +@staticmethod +def save_metadata(instance, path, sc, extraMetadata=None, paramMap=None): +metadataPath = os.path.join(path, "metadata") +metadataJson = DefaultParamsWriter.get_metadata_to_save(instance, + metadataPath, +sc, + extraMetadata, +paramMap) +sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath) + +@staticmethod +def get_metadata_to_save(instance, path, sc, extraMetadata=None, paramMap=None): +uid = instance.uid +cls = instance.__module__ + '.' + instance.__class__.__name__ +params = instance.extractParamMap() +jsonParams = {} +if paramMap is not None: +for p in paramMap: +jsonParams[p.name] = paramMap[p] +else: +for p in params: +jsonParams[p.name] = params[p] +basicMetadata = {"class": cls, "timestamp": long(round(time.time() * 1000)), + "sparkVersion": sc.version, "uid": uid, "paramMap": jsonParams} +if extraMetadata is not None: +basicMetadata.update(extraMetadata) +return json.dumps(basicMetadata, separators=[',', ':']) + + +@inherit_doc +class DefaultParamsReadable(MLReadable): +""" +Class for making simple Params types readable. Assumes that all parameters --- End diff -- Could you please merge the Scala doc into this to make this more detailed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131287028 --- Diff: python/pyspark/ml/util.py --- @@ -61,20 +66,74 @@ def _randomUID(cls): @inherit_doc -class MLWriter(object): +class BaseReadWrite(object): +""" +Base class for MLWriter and MLReader. Stores information about the SparkContext +and SparkSession. + +.. versionadded:: 2.3.0 +""" + +def __init__(self): +self._sparkSession = None + +def context(self, sqlContext): +""" +Sets the Spark SQLContext to use for saving/loading. + +.. note:: Deprecated in 2.1 and will be removed in 3.0, use session instead. +""" +raise NotImplementedError("MLWriter is not yet implemented for type: %s" % type(self)) + +def session(self, sparkSession): +""" +Sets the Spark Session to use for saving/loading. +""" +self._sparkSession = sparkSession +return self + +def sparkSession(self): +if self._sparkSession is None: +self._sparkSession = SparkSession.builder.getOrCreate() +return self._sparkSession + +@property +def sc(self): +return self.sparkSession().sparkContext + + +@inherit_doc +class MLWriter(BaseReadWrite): """ Utility class that can save ML instances. .. versionadded:: 2.0.0 """ +def __init__(self): +super(MLWriter, self).__init__() +self.shouldOverwrite = False + +def _handleOverwrite(self, path): +from pyspark.ml.wrapper import JavaWrapper + +_java_obj = JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite") +wrapper = JavaWrapper(_java_obj) +wrapper._call_java("handleOverwrite", path, True, self.sc._jsc.sc()) + def save(self, path): """Save the ML instance to the input path.""" +if self.shouldOverwrite: +self._handleOverwrite(path) +self.saveImpl(path) + +def saveImpl(self, path): --- End diff -- Add doc from Scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131288786 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +353,143 @@ def numFeatures(self): Returns the number of features the model was trained on. If unknown, returns -1 """ return self._call_java("numFeatures") + + +@inherit_doc +class DefaultParamsWritable(MLWritable): +""" +Class for making simple Params types writable. Assumes that all parameters +are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def write(self): +"""Returns a DefaultParamsWriter instance for this class.""" +if isinstance(self, Params): +return DefaultParamsWriter(self) +else: +raise TypeError("Cannot use DefautParamsWritable with type %s because it does not " + +" extend Params.", type(self)) + + +@inherit_doc +class DefaultParamsWriter(MLWriter): +""" +Class for writing Estimators and Transformers whose parameters are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def __init__(self, instance): +super(DefaultParamsWriter, self).__init__() +self.instance = instance + +def saveImpl(self, path): +DefaultParamsWriter.save_metadata(self.instance, path, self.sc) + +@staticmethod +def save_metadata(instance, path, sc, extraMetadata=None, paramMap=None): +metadataPath = os.path.join(path, "metadata") +metadataJson = DefaultParamsWriter.get_metadata_to_save(instance, + metadataPath, +sc, + extraMetadata, +paramMap) +sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath) + +@staticmethod +def get_metadata_to_save(instance, path, sc, extraMetadata=None, paramMap=None): --- End diff -- path is not used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131288351 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +353,143 @@ def numFeatures(self): Returns the number of features the model was trained on. If unknown, returns -1 """ return self._call_java("numFeatures") + + +@inherit_doc +class DefaultParamsWritable(MLWritable): +""" +Class for making simple Params types writable. Assumes that all parameters --- End diff -- Could you please merge the Scala doc into this to make this more detailed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131290424 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +353,143 @@ def numFeatures(self): Returns the number of features the model was trained on. If unknown, returns -1 """ return self._call_java("numFeatures") + + +@inherit_doc +class DefaultParamsWritable(MLWritable): +""" +Class for making simple Params types writable. Assumes that all parameters +are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def write(self): +"""Returns a DefaultParamsWriter instance for this class.""" +if isinstance(self, Params): +return DefaultParamsWriter(self) +else: +raise TypeError("Cannot use DefautParamsWritable with type %s because it does not " + +" extend Params.", type(self)) + + +@inherit_doc +class DefaultParamsWriter(MLWriter): +""" +Class for writing Estimators and Transformers whose parameters are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def __init__(self, instance): +super(DefaultParamsWriter, self).__init__() +self.instance = instance + +def saveImpl(self, path): +DefaultParamsWriter.save_metadata(self.instance, path, self.sc) + +@staticmethod +def save_metadata(instance, path, sc, extraMetadata=None, paramMap=None): +metadataPath = os.path.join(path, "metadata") +metadataJson = DefaultParamsWriter.get_metadata_to_save(instance, + metadataPath, +sc, + extraMetadata, +paramMap) +sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath) + +@staticmethod +def get_metadata_to_save(instance, path, sc, extraMetadata=None, paramMap=None): +uid = instance.uid +cls = instance.__module__ + '.' + instance.__class__.__name__ +params = instance.extractParamMap() +jsonParams = {} +if paramMap is not None: +for p in paramMap: +jsonParams[p.name] = paramMap[p] +else: +for p in params: +jsonParams[p.name] = params[p] +basicMetadata = {"class": cls, "timestamp": long(round(time.time() * 1000)), + "sparkVersion": sc.version, "uid": uid, "paramMap": jsonParams} +if extraMetadata is not None: +basicMetadata.update(extraMetadata) +return json.dumps(basicMetadata, separators=[',', ':']) + + +@inherit_doc +class DefaultParamsReadable(MLReadable): +""" +Class for making simple Params types readable. Assumes that all parameters +are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +@classmethod +def read(cls): +"""Returns a DefaultParamsReader instance for this class.""" +return DefaultParamsReader(cls) + + +@inherit_doc +class DefaultParamsReader(MLReader): +""" +Class for reading Estimators and Transformers whose parameters are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def __init__(self, cls): +super(DefaultParamsReader, self).__init__() +self.cls = cls + +@staticmethod +def __get_class(clazz): +""" +Loads Python class from its name. +""" +parts = clazz.split('.') +module = ".".join(parts[:-1]) +m = __import__(module) +for comp in parts[1:]: +m = getattr(m, comp) +return m + +def load(self, path): +metadata = DefaultParamsReader.loadMetadata(path, self.sc) +py_type = DefaultParamsReader.__get_class(metadata['class']) +instance = py_type() +instance._resetUid(metadata['uid']) +DefaultParamsReader.getAndSetParams(instance, metadata) +return instance + +@staticmethod +def loadMetadata(path, sc, expectedClassName=""): --- End diff -- These static methods can be private (add leading underscores) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact
[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131288910 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +353,143 @@ def numFeatures(self): Returns the number of features the model was trained on. If unknown, returns -1 """ return self._call_java("numFeatures") + + +@inherit_doc +class DefaultParamsWritable(MLWritable): +""" +Class for making simple Params types writable. Assumes that all parameters +are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def write(self): +"""Returns a DefaultParamsWriter instance for this class.""" +if isinstance(self, Params): +return DefaultParamsWriter(self) +else: +raise TypeError("Cannot use DefautParamsWritable with type %s because it does not " + +" extend Params.", type(self)) + + +@inherit_doc +class DefaultParamsWriter(MLWriter): +""" +Class for writing Estimators and Transformers whose parameters are JSON-serializable. + +.. versionadded:: 2.3.0 +""" + +def __init__(self, instance): +super(DefaultParamsWriter, self).__init__() +self.instance = instance + +def saveImpl(self, path): +DefaultParamsWriter.save_metadata(self.instance, path, self.sc) + +@staticmethod +def save_metadata(instance, path, sc, extraMetadata=None, paramMap=None): +metadataPath = os.path.join(path, "metadata") +metadataJson = DefaultParamsWriter.get_metadata_to_save(instance, + metadataPath, +sc, + extraMetadata, +paramMap) +sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath) + +@staticmethod +def get_metadata_to_save(instance, path, sc, extraMetadata=None, paramMap=None): --- End diff -- add leading underscore to make this private --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org