[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79312/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18560: Revise rand comparison in BatchEvalPythonExecSuit...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/18560 Revise rand comparison in BatchEvalPythonExecSuite ## What changes were proposed in this pull request? Revise rand comparison in BatchEvalPythonExecSuite In BatchEvalPythonExecSuite, there are two cases using the case "rand() > 3" Rand() generates a random value in [0, 1), it is wired to be compared with 3, use 0.3 instead ## How was this patch tested? unit test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark revise_BatchEvalPythonExecSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18560 commit 67841437eb2cffec5686fafd07cb1233a1e5072a Author: Wang GengliangDate: 2017-07-07T05:50:24Z revise BatchEvalPythonExecSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r126074450 --- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql --- @@ -0,0 +1,23 @@ +-- Catch case-sensitive name duplication +SET spark.sql.caseSensitive=true; + +CREATE TABLE t(c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet; --- End diff -- We should keep them in one place. For now I think we still need to put them in `DDLSuite` because we need to run it with and without hive support. Can we pick some typical test cases here and move them to `DDLSuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79312/testReport)** for PR 18559 at commit [`7279262`](https://github.com/apache/spark/commit/72792627d76e0e3452f84af1322a35e3f0d82580). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r126074238 --- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql --- @@ -0,0 +1,23 @@ +-- Catch case-sensitive name duplication +SET spark.sql.caseSensitive=true; + +CREATE TABLE t(c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet; --- End diff -- In `DDLSuite`, we already have simple tests for duplicate columns. we better moving these tests there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r126073404 --- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql --- @@ -0,0 +1,23 @@ +-- Catch case-sensitive name duplication +SET spark.sql.caseSensitive=true; + +CREATE TABLE t(c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet; --- End diff -- We didn't have test cases for create table before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79311/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79311/testReport)** for PR 18559 at commit [`4d99c11`](https://github.com/apache/spark/commit/4d99c11802efa2d6ee5c36de5941226bf12e1a55). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18444 Thanks for asking @ueshin. Sounds OK to me too. I currently have some pending review comments for minor nits. Let me finish mine within today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072754 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- yea that seems wrong ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072760 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- It's supported since 2.0.X, so definitely there are existing user queires and apps. I'm agreeing with this PR and want to understand the scope of changes. It looks good to me. ```scala scala> sc.version res0: String = 2.0.2 scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") scala> sql("SELECT v.i from (SELECT i FROM v)").show +---+ | i| +---+ | 1| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072118 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- we may have, but this is definitely wrong IMO. BTW at least we don't have this usage in our tests, so I think it's probably fine. also cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector shoul...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/18557 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 Yep. It's totally internal officially. What I meant with `performance issue` is 3rd party can still use it and there might be a performance gap between `float` and `double`. I'll close this PR. Thank you again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126071406 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- Do we have such usage in existing queries? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18444 LGTM, pending Jenkins. @HyukjinKwon, @holdenk, Do you have any other concerns? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #609: SPARK-1691: Support quoted arguments inside of spark-submi...
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/609 @ganeshm25 it seems to work in newer spark versions. i havent tried in spark 1.4.2. however its still very tricky to get it right and i would prefer a simpler solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18462: [SPARK-21333][Docs] Removed invalid joinTypes fro...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18462#discussion_r126071195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1007,6 +1007,10 @@ class Dataset[T] private[sql]( JoinType(joinType), Some(condition.expr))).analyzed.asInstanceOf[Join] +if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) { + throw new AnalysisException("Invalid join type in joinWith: " + joined.joinType) --- End diff -- Nit: `joined.joinType.sql`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126071223 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- Then, the scope of breaking change is reduced into this kind of queries? ```scala scala> sc.version res0: String = 2.1.1 scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") scala> sql("SELECT v.i from (SELECT i FROM v)").show +---+ | i| +---+ | 1| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics w...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18558 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #609: SPARK-1691: Support quoted arguments inside of spark-submi...
Github user ganeshm25 commented on the issue: https://github.com/apache/spark/pull/609 @koertkuipers i am trying to do achieve running the multiple driver-java-options with Spark 1.4.2 inside a bash script? is there a solution you found for this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18444 **[Test build #79314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79314/testReport)** for PR 18444 at commit [`f2774c6`](https://github.com/apache/spark/commit/f2774c639fdf653ec7d48127b529124dbbb9b60b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18558 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 We didn't change `spark.shuffle.io.numConnectionsPerPeer`. Our biggest cluster has 6000 `NodeManager`s. There are 50 executors running on a same host at the same time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18425: [SPARK-21217][SQL] Support ColumnVector.Array.to<...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18425 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18425: [SPARK-21217][SQL] Support ColumnVector.Array.toAr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18425 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 @cloud-fan To be honest, it's a little bit tricky to reject "open blocks" by closing the connection. The following reconnection will surely have extra cost. In current change we are relying on retry mechanism of `RetryingBlockFetcher`. `spark.shuffle.io.maxRetries` and `spark.shuffle.io.retryWait` should also be tuned, with this change maybe their meanings become different, users should know this. This is the sacrifice for compatibility. It comes to me that can we add back `OpenBlocksFailed` and add a flag(default false)? If user wants to turned on, we can tell them they should upgrade the client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18444 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18553: [SPARK-21327][SQL][PYSPARK] ArrayConstructor shou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18553 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18557 `ColumnVector` is total internal in Spark 2.2, so there won't be 3rd party Spark library issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18558 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79309/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18553: [SPARK-21327][SQL][PYSPARK] ArrayConstructor should hand...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18553 Thanks for reviewing! merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18558 **[Test build #79309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79309/testReport)** for PR 18558 at commit [`dedafd9`](https://github.com/apache/spark/commit/dedafd95835ddd65118825d74c4592f35b73b3d8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18388 > there are 200K+ connections and 3.5M blocks(FileSegmentManagedBuffer) being fetched. Did you use a large `spark.shuffle.io.numConnectionsPerPeer`? If not, the number of connections seems too large since each ShuffleClient should have only one connection to one shuffle service. How large is your cluster and how many applications are running at the same time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 BTW, thank you for swift reviews and feedbacks on my PR. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 Analyzing the heap dump, there are 200K+ connections and 3.5M blocks(`FileSegmentManagedBuffer`) being fetched. Yes, flow control is a good idea. But I still think it make much sense to control the concurrency. Reject some "open blocks" requests, thus we can have sufficient bandwidth for the existing connections and we can finish the reduce task as soon as possible. Simple flow control(slow down connections when pressure) can help avoid OOM, but it seems more reduce tasks will run longer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 I know that 'there is no usage of this API internally in Spark 2.2', but it's only for 2.2.0. My reason was any 3rd party Spark library cannot use `ColumnVector` for `float` type in Spark 2.2.1+. Anyway, @cloud-fan changes the bug type. If that means backporting is not allowed for this patch, I have no objection for the community decision. So, @kiszk and @cloud-fan . Given that, may I close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18307: [SPARK-21100][SQL] Add summary method as alternative to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18307 **[Test build #79313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79313/testReport)** for PR 18307 at commit [`3b548cc`](https://github.com/apache/spark/commit/3b548cc3d5ad8928785fe644db9ea788dfb8fad2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18307: [SPARK-21100][SQL] Add summary method as alternative to ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18307 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18557 I've changed the ticket type from `bug` to `improvement`, adding a new API is not fixing a bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126068180 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) +object ExtractableLiteral { + def unapply(expr: Expression): Option[String] = expr match { +case Literal(value, _: IntegralType) => Some(value.toString) +case Literal(value, _: StringType) => Some(quoteStringLiteral(value.toString)) +case _ => None + } +} + +object ExtractableLiterals { + def unapply(exprs: Seq[Expression]): Option[Seq[String]] = { + exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) { + case (Some(accum), Some(value)) => Some(accum :+ value) + case _ => None +} + } +} + +lazy val convert: PartialFunction[Expression, String] = { + case In(a: Attribute, ExtractableLiterals(values)) if !varcharKeys.contains(a.name) => +val or = + values +.map(value => s"${a.name} = $value") +.reduce(_ + " or " + _) +"(" + or + ")" + case op @ BinaryComparison(a: Attribute, ExtractableLiteral(value)) if !varcharKeys.contains(a.name) => -s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}""" - case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute) +s"${a.name} ${op.symbol} $value" + case op @ BinaryComparison(ExtractableLiteral(value), a: Attribute) if !varcharKeys.contains(a.name) => -s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}""" -}.mkString(" and ") +s"$value ${op.symbol} ${a.name}" + case op @ And(expr1, expr2) => +s"(${convert(expr1)} and ${convert(expr2)})" + case op @ Or(expr1, expr2) => +s"(${convert(expr1)} or ${convert(expr2)})" +} + +filters.flatMap(f => Try(convert(f)).toOption).mkString(" and ") --- End diff -- I do think we should follow `InMemoryTableScanExec.buildFilters`. For example, if the left side of `And` is not supported but the right side is, and we can still push down the right side. But here, we simply catch the exception and push nothing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18557 We have not seen any failure in test suites. And, [there is no usage of this API](https://github.com/apache/spark/pull/17836#discussion_r114488839) in Spark 2.2. Does this missing cause any failure of test or application program? If so, it is good to put a sample program in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126067892 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) +object ExtractableLiteral { + def unapply(expr: Expression): Option[String] = expr match { +case Literal(value, _: IntegralType) => Some(value.toString) +case Literal(value, _: StringType) => Some(quoteStringLiteral(value.toString)) +case _ => None + } +} + +object ExtractableLiterals { + def unapply(exprs: Seq[Expression]): Option[Seq[String]] = { + exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) { + case (Some(accum), Some(value)) => Some(accum :+ value) + case _ => None +} + } +} + +lazy val convert: PartialFunction[Expression, String] = { + case In(a: Attribute, ExtractableLiterals(values)) if !varcharKeys.contains(a.name) => --- End diff -- cc @gatorsmile , any concerns to not do it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16697 LGTM, pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18556 Thank you @cloud-fan! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126067471 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) - if !varcharKeys.contains(a.name) => -s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}""" - case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute) - if !varcharKeys.contains(a.name) => -s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}""" -}.mkString(" and ") +def isExtractable(expr: Expression): Boolean = + expr match { +case Literal(_, _: IntegralType) | Literal(_, _: StringType) => true +case _ => false + } + +def extractValue(expr: Expression): String = + expr match { +case Literal(v, _: IntegralType) => v.toString +case Literal(v, _: StringType) => quoteStringLiteral(v.toString) + } + +lazy val convert: PartialFunction[Expression, String] = + { +case In(a: Attribute, exprs) +if !varcharKeys.contains(a.name) && exprs.forall(isExtractable) => + val or = +exprs + .map(expr => s"${a.name} = ${extractValue(expr)}") + .reduce(_ + " or " + _) + "(" + or + ")" +case op @ BinaryComparison(a: Attribute, expr2) --- End diff -- how about `ExtractLiteralToString`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79312/testReport)** for PR 18559 at commit [`7279262`](https://github.com/apache/spark/commit/72792627d76e0e3452f84af1322a35e3f0d82580). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18556 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18288: [SPARK-21066][ML] LibSVM load just one input file
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18288 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should cache weightCol if ne...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18554 I'm not familiar with R, and use grep to search "OneVsRest" and get nothing. Hence it seems that nothing is needed to do with R part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18523 @SparkQA test again, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 Hi, @kiszk . I think this is a bug fix of `ColumnVector` as described in [SPARK-20566](https://issues.apache.org/jira/browse/SPARK-20566). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18556 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126066952 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -102,6 +104,25 @@ object MLUtils extends Logging { .map(parseLibSVMRecord) } + private[spark] def parseLibSVMFile( + sparkSession: SparkSession, paths: Seq[String]): RDD[(Double, Array[Int], Array[Double])] = { +val lines = sparkSession.baseRelationToDataFrame( + DataSource.apply( +sparkSession, +paths = paths, +className = classOf[TextFileFormat].getName + ).resolveRelation(checkFilesExist = false)) + .select("value") --- End diff -- is this needed? I think text format is known to have only one column. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18559 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066595 --- Diff: sql/core/src/test/resources/sql-tests/results/string-functions.sql.out --- @@ -30,20 +30,20 @@ abc -- !query 3 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col -FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t +FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) -- !query 3 schema struct -- !query 3 output == Parsed Logical Plan == 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x] -+- 'SubqueryAlias t ++- 'SubqueryAlias _auto_generated_subquery_name --- End diff -- I think it's ok, as the name is quite clear about it's auto-generated. And I think it's hard to hide it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18558 LGTM pending jenkins, also cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79311/testReport)** for PR 18559 at commit [`4d99c11`](https://github.com/apache/spark/commit/4d99c11802efa2d6ee5c36de5941226bf12e1a55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066489 --- Diff: sql/core/src/test/resources/sql-tests/results/string-functions.sql.out --- @@ -30,20 +30,20 @@ abc -- !query 3 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col -FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t +FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) -- !query 3 schema struct -- !query 3 output == Parsed Logical Plan == 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x] -+- 'SubqueryAlias t ++- 'SubqueryAlias _auto_generated_subquery_name --- End diff -- Do we want to show the internal subquery name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066311 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -751,15 +751,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { * hooks. */ override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = withOrigin(ctx) { -// The unaliased subqueries in the FROM clause are disallowed. Instead of rejecting it in -// parser rules, we handle it here in order to provide better error message. -if (ctx.strictIdentifier == null) { - throw new ParseException("The unaliased subqueries in the FROM clause are not supported.", -ctx) +val alias = if (ctx.strictIdentifier == null) { + // For un-aliased subqueries, ues a default alias name that is not likely to conflict with --- End diff -- nit: typo `ues`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/18559 [SPARK-21335][SQL] support un-aliased subquery ## What changes were proposed in this pull request? un-aliased subquery is supported by Spark SQL for a long time. Its semantic was not well defined and had confusing behaviors, and it's not a standard SQL syntax, so we disallowed it in https://issues.apache.org/jira/browse/SPARK-20690 . However, this is a breaking change, and we do have existing queries using un-aliased subquery. We should add the support back and fix its semantic. This PR fixes the un-aliased subquery by assigning a default alias name. ## How was this patch tested? new regression test You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark sub-query Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18559 commit 4d99c11802efa2d6ee5c36de5941226bf12e1a55 Author: Wenchen FanDate: 2017-07-07T04:03:34Z support un-aliased subquery --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18559 cc @rxin @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18557 @dongjoon-hyun Is there any reason to backport this to previous versions? This is because we had such [a discussion](https://github.com/apache/spark/pull/17836#pullrequestreview-35957231). Obviously, it makes sense to support the latest one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 Hi, @cloud-fan . This is the backport for #17836 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18557 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18557 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79306/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18557 **[Test build #79306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79306/testReport)** for PR 18557 at commit [`39839bf`](https://github.com/apache/spark/commit/39839bf5b70aab603e538d424cda00ec7cde1402). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16697 **[Test build #79310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79310/testReport)** for PR 16697 at commit [`554cd39`](https://github.com/apache/spark/commit/554cd391b3ddb5fb3f7c52950610e832ad40047b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79308/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #79308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79308/testReport)** for PR 18465 at commit [`c08ccd5`](https://github.com/apache/spark/commit/c08ccd59f438fce1f841aa70f760ffb9dc24cf50). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16697 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18558 **[Test build #79309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79309/testReport)** for PR 18558 at commit [`dedafd9`](https://github.com/apache/spark/commit/dedafd95835ddd65118825d74c4592f35b73b3d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18553: [SPARK-21327][SQL][PYSPARK] ArrayConstructor should hand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18553 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18553: [SPARK-21327][SQL][PYSPARK] ArrayConstructor should hand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18553 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79304/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18558 cc @cloud-fan This removes the writeTime metrics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18553: [SPARK-21327][SQL][PYSPARK] ArrayConstructor should hand...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18553 **[Test build #79304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79304/testReport)** for PR 18553 at commit [`15b7497`](https://github.com/apache/spark/commit/15b7497b76c031488b8ec414f1363f3393f0a3e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics w...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18558 [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with data writes onto DataFrameWriter operations ## What changes were proposed in this pull request? Remove time metrics since it seems no way to measure it in non per-row tracking. ## How was this patch tested? Existing tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-20703-followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18558 commit dedafd95835ddd65118825d74c4592f35b73b3d8 Author: Liang-Chi HsiehDate: 2017-07-07T02:35:48Z Remove time metrics since it seems no way to measure it in non per-row tracking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79307/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18556 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18556 **[Test build #79307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79307/testReport)** for PR 18556 at commit [`b345cb1`](https://github.com/apache/spark/commit/b345cb14758ae6f16d699b2f38b17eefbf316468). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #79308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79308/testReport)** for PR 18465 at commit [`c08ccd5`](https://github.com/apache/spark/commit/c08ccd59f438fce1f841aa70f760ffb9dc24cf50). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 (simply rebased) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18482: [SPARK-21262] Stop sending 'stream request' when ...
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/18482 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r126056717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -314,21 +339,40 @@ object FileFormatWriter extends Logging { recordsInFile = 0 releaseResources() + numOutputRows += recordsInFile newOutputWriter(fileCounter) } val internalRow = iter.next() +val startTime = System.nanoTime() currentWriter.write(internalRow) +timeOnCurrentFile += (System.nanoTime() - startTime) --- End diff -- Yeah, I also considered this option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18482: [SPARK-21262] Stop sending 'stream request' when shuffle...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18482 Sure, I will update the document soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79305/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18556 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18556 **[Test build #79305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79305/testReport)** for PR 18556 at commit [`de33d6d`](https://github.com/apache/spark/commit/de33d6d8809e87edbe42c2dfab4b914da29c7143). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126053740 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -89,18 +93,14 @@ private[libsvm] class LibSVMFileFormat extends TextBasedFileFormat with DataSour files: Seq[FileStatus]): Option[StructType] = { val libSVMOptions = new LibSVMOptions(options) val numFeatures: Int = libSVMOptions.numFeatures.getOrElse { - // Infers number of features if the user doesn't specify (a valid) one. - val dataFiles = files.filterNot(_.getPath.getName startsWith "_") - val path = if (dataFiles.length == 1) { -dataFiles.head.getPath.toUri.toString - } else if (dataFiles.isEmpty) { -throw new IOException("No input path specified for libsvm data") - } else { -throw new IOException("Multiple input paths are not supported for libsvm data.") - } - - val sc = sparkSession.sparkContext - val parsed = MLUtils.parseLibSVMFile(sc, path, sc.defaultParallelism) + require(files.nonEmpty, "No input path specified for libsvm data") --- End diff -- Please refer https://github.com/apache/spark/pull/18556#discussion_r126045375. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18556 **[Test build #79307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79307/testReport)** for PR 18556 at commit [`b345cb1`](https://github.com/apache/spark/commit/b345cb14758ae6f16d699b2f38b17eefbf316468). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18509: [SPARK-21329][SS] Make EventTimeWatermarkExec exp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18509 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18509: [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18509 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126051332 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -89,18 +93,17 @@ private[libsvm] class LibSVMFileFormat extends TextBasedFileFormat with DataSour files: Seq[FileStatus]): Option[StructType] = { val libSVMOptions = new LibSVMOptions(options) val numFeatures: Int = libSVMOptions.numFeatures.getOrElse { - // Infers number of features if the user doesn't specify (a valid) one. - val dataFiles = files.filterNot(_.getPath.getName startsWith "_") - val path = if (dataFiles.length == 1) { -dataFiles.head.getPath.toUri.toString - } else if (dataFiles.isEmpty) { + if (files.isEmpty) { throw new IOException("No input path specified for libsvm data") --- End diff -- Actually, that should be right after this function call so probably fine :). Yea, but at least using `require` should be shorter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18557 **[Test build #79306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79306/testReport)** for PR 18557 at commit [`39839bf`](https://github.com/apache/spark/commit/39839bf5b70aab603e538d424cda00ec7cde1402). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126050849 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -89,18 +93,17 @@ private[libsvm] class LibSVMFileFormat extends TextBasedFileFormat with DataSour files: Seq[FileStatus]): Option[StructType] = { val libSVMOptions = new LibSVMOptions(options) val numFeatures: Int = libSVMOptions.numFeatures.getOrElse { - // Infers number of features if the user doesn't specify (a valid) one. - val dataFiles = files.filterNot(_.getPath.getName startsWith "_") - val path = if (dataFiles.length == 1) { -dataFiles.head.getPath.toUri.toString - } else if (dataFiles.isEmpty) { + if (files.isEmpty) { throw new IOException("No input path specified for libsvm data") --- End diff -- In my opinion, it is safe / necessary to check whether the parameter is valid in advance. Perhaps `IllegalArgumentException` is more suitable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector shoul...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/18557 [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should support `appendFloats` for array ## What changes were proposed in this pull request? This PR aims to add a missing `appendFloats` API for array into **ColumnVector** class. For double type, there is `appendDoubles` for array [here](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java#L818-L824). ## How was this patch tested? Pass the Jenkins with a newly added test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-20566-BRANCH-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18557.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18557 commit 39839bf5b70aab603e538d424cda00ec7cde1402 Author: Dongjoon HyunDate: 2017-05-04T13:04:15Z [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should support `appendFloats` for array This PR aims to add a missing `appendFloats` API for array into **ColumnVector** class. For double type, there is `appendDoubles` for array [here](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java#L818-L824). Pass the Jenkins with a newly added test case. Author: Dongjoon Hyun Closes #17836 from dongjoon-hyun/SPARK-20566. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org