[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 The order is different from the original one that is evaluated in the join conditions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r134932083 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -32,7 +32,9 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl * in that file. */ class HadoopFileLinesReader( -file: PartitionedFile, conf: Configuration) extends Iterator[Text] with Closeable { +file: PartitionedFile, +lineSeparator: Option[String], --- End diff -- We do not know what is the default line separator? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading r...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/18962#discussion_r134932043 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -330,19 +332,21 @@ object SparkSubmit extends CommandLineUtils { args.archives = Option(args.archives).map(resolveGlobPaths(_, hadoopConf)).orNull // In client mode, download remote files. +var localPrimaryResource: String = null +var localJars: String = null +var localPyFiles: String = null --- End diff -- @tgravescs , if we also download files/archives to local path, then how do we leverage them, since we don't expose the path to the user, even with previous code downloaded files seems never can be used for driver. So for the semantic completeness, we still need to change some codes to support this feature as what @vanzin mentioned. I agree with you current state of the code is confused for user (some are downloaded which other are not). I think we could fix it in the following PR, what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19017: [SPARK-21804][SQL] json_tuple returns null values...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19017#discussion_r134931404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -447,7 +448,18 @@ case class JsonTuple(children: Seq[Expression]) generator => copyCurrentStructure(generator, parser) } -row(idx) = UTF8String.fromBytes(output.toByteArray) +val jsonValue = UTF8String.fromBytes(output.toByteArray) +row(idx) = jsonValue +idx = idx + 1 + +// SPARK-21804: json_tuple returns null values within repeated columns +// except the first one; so that we need to check the remaining fields. +while (idx < fieldNames.length) { + if (fieldNames(idx) == jsonField) { +row(idx) = jsonValue + } + idx = idx + 1 +} --- End diff -- I am also thinking if we should use a Hash table. However,,, the number of columns is not large. Thus, it might not get a noticeable benefit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19017: [SPARK-21804][SQL] json_tuple returns null values...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19017#discussion_r134931120 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -447,7 +448,18 @@ case class JsonTuple(children: Seq[Expression]) generator => copyCurrentStructure(generator, parser) } -row(idx) = UTF8String.fromBytes(output.toByteArray) +val jsonValue = UTF8String.fromBytes(output.toByteArray) +row(idx) = jsonValue +idx = idx + 1 + +// SPARK-21804: json_tuple returns null values within repeated columns +// except the first one; so that we need to check the remaining fields. +while (idx < fieldNames.length) { + if (fieldNames(idx) == jsonField) { +row(idx) = jsonValue + } + idx = idx + 1 +} --- End diff -- ```Scala row(idx) = jsonValue idx = idx + 1 // SPARK-21804: json_tuple returns null values within repeated columns // except the first one; so that we need to check the remaining fields. while (idx < fieldNames.length) { if (fieldNames(idx) == jsonField) { row(idx) = jsonValue } idx = idx + 1 } ``` -> ```Scala do { row(idx) = jsonValue idx = fieldNames.indexOf(jsonField, idx + 1) } while (idx >= 0) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading remote r...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18962 Yes @tgravescs if we download everything to local and then upload to yarn, http/https/ftp should be unrelated here. But still in yarn cluster mode, if we specify remote http jars, then yarn client will be fail to handle this jar, so the issue still exists. And I'm going to create a separate JIRA to track this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19018 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81070/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19018 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19018 **[Test build #81070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81070/testReport)** for PR 19018 at commit [`05b3cab`](https://github.com/apache/spark/commit/05b3cabaff89396d352ece41d57c7fd9eb2ef917). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18730 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19028: [MINOR][SQL] The comment of Class ExchangeCoordinator ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19028 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19028: [MINOR][SQL] The comment of Class ExchangeCoordinator ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81063/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18730 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81064/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19028: [MINOR][SQL] The comment of Class ExchangeCoordinator ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19028 **[Test build #81063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81063/testReport)** for PR 19028 at commit [`837536f`](https://github.com/apache/spark/commit/837536fee8427f9b527ace401924f9a703ba38d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18730 **[Test build #81064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81064/testReport)** for PR 18730 at commit [`72aef67`](https://github.com/apache/spark/commit/72aef679b498bb042ecb9ffa8df62ed41e1f519d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...
Github user neoremind commented on the issue: https://github.com/apache/spark/pull/18964 @zsxwing I did try to create a performance test against spark rpc, the test result can be found [here](https://github.com/neoremind/kraps-rpc#4-performance-test), note that I created the project for studying purpose and the code is based on 2.1.0. But as you said, the performance would not be dropped as client not using `SO_RCVBUF` and `SO_SNDBUF` set in `SparkConf`. For example, I use the scenario of concurrent calls 10, total calls 100k, keep all things as default, the QPS would be around 11k. When I set `SO_RCVBUF` and ` SO_SNDBUF` to extremely small number like 100 the performance is affected tremendously. If they are set to a large number like 128k, the results won't be boosted by whether clients set the corresponding `SO_RCVBUF` and `SO_SNDBUF` value or not. I admit that the update is trivial but from users' perspective, if `spark.{module}.io.sendBuffer` and `spark.{module}.io.sendBuffer` are exposed outside and could be set, and they only works on server side, I think it is a little bit not consistent, so I raise the PR to try to make it work on both server and client side, just to make them consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18730 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81061/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18730 **[Test build #81061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81061/testReport)** for PR 18730 at commit [`bab91db`](https://github.com/apache/spark/commit/bab91db933947b57159b21e5f6506570b6b721cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19027 LGTM. Btw, I'm just curious why we need tests with `numpy` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 Will probably take a look through the problem in the near future including hard dependencies and etc. I took a quick look but I think I need more time but yes it looks appearently vaild point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81060/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19031 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81062/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81060/testReport)** for PR 18581 at commit [`47e8d37`](https://github.com/apache/spark/commit/47e8d3761681611a9ee6d50d6c812babd395dace). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19031 **[Test build #81062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81062/testReport)** for PR 19031 at commit [`9438655`](https://github.com/apache/spark/commit/94386550523baf5f98427d3ef0b9f9815cee4c69). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19018 **[Test build #81070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81070/testReport)** for PR 19018 at commit [`05b3cab`](https://github.com/apache/spark/commit/05b3cabaff89396d352ece41d57c7fd9eb2ef917). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19018 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81059/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19017: [SPARK-21804][SQL] json_tuple returns null values...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19017#discussion_r134925698 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -447,7 +448,18 @@ case class JsonTuple(children: Seq[Expression]) generator => copyCurrentStructure(generator, parser) } -row(idx) = UTF8String.fromBytes(output.toByteArray) +val jsonValue = UTF8String.fromBytes(output.toByteArray) +row(idx) = jsonValue +idx = idx + 1 + +// SPARK-21804: json_tuple returns null values within repeated columns +// except the first one; so that we need to check the remaining fields. +while (idx < fieldNames.length) { + if (fieldNames(idx) == jsonField) { +row(idx) = jsonValue + } + idx = idx + 1 +} --- End diff -- Would you maybe have a suggestion? The current status looks fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81059/testReport)** for PR 18581 at commit [`3555e5d`](https://github.com/apache/spark/commit/3555e5dafa85dcee404599c78b17cbb97b1709f0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19017: [SPARK-21804][SQL] json_tuple returns null values...
Github user jmchung commented on a diff in the pull request: https://github.com/apache/spark/pull/19017#discussion_r134925669 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -447,7 +448,18 @@ case class JsonTuple(children: Seq[Expression]) generator => copyCurrentStructure(generator, parser) } -row(idx) = UTF8String.fromBytes(output.toByteArray) +val jsonValue = UTF8String.fromBytes(output.toByteArray) +row(idx) = jsonValue +idx = idx + 1 + +// SPARK-21804: json_tuple returns null values within repeated columns +// except the first one; so that we need to check the remaining fields. +while (idx < fieldNames.length) { + if (fieldNames(idx) == jsonField) { +row(idx) = jsonValue + } + idx = idx + 1 +} --- End diff -- If I comment out the L451-452, the repeated fields still have the same jsonValue because `fieldNames(idx) == jsonField`, but the first comparison is not necessary since `idx >= 0` means matched. Could you please give me some advice? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19018 but i think in general it's better to make tests more predictable like this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19018 the error specifically is: `The input column stridx_87ea3065aeb2 should have at least two distinct values.` I don't think this would be only happening in R - I suppose whenever the string label has only one distinct value, ml's random forest will just give up like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19016: [SPARK-21805][SPARKR] Disable R vignettes code on...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19016 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18945: Add option to convert nullable int columns to flo...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18945#discussion_r134925269 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1762,7 +1762,7 @@ def toPandas(self): else: --- End diff -- If we use this approach, how about the following to check if the type corrections are needed: ```python dtype = {} for field in self.schema: pandas_type = _to_corrected_pandas_type(field.dataType) if pandas_type is not None and not(field.nullable and pdf[field.name].isnull().any()): dtype[field.name] = pandas_type ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19016: [SPARK-21805][SPARKR] Disable R vignettes code on Window...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19016 thanks, merged to master/2.2. will check for nightly build from tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19017: [SPARK-21804][SQL] json_tuple returns null values...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19017#discussion_r134923877 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -447,7 +448,18 @@ case class JsonTuple(children: Seq[Expression]) generator => copyCurrentStructure(generator, parser) } -row(idx) = UTF8String.fromBytes(output.toByteArray) +val jsonValue = UTF8String.fromBytes(output.toByteArray) +row(idx) = jsonValue +idx = idx + 1 + +// SPARK-21804: json_tuple returns null values within repeated columns +// except the first one; so that we need to check the remaining fields. +while (idx < fieldNames.length) { + if (fieldNames(idx) == jsonField) { +row(idx) = jsonValue + } + idx = idx + 1 +} --- End diff -- You still can simplify the codes a lot without functional transformation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19017: [SPARK-21804][SQL] json_tuple returns null values...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19017#discussion_r134923403 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -447,7 +448,18 @@ case class JsonTuple(children: Seq[Expression]) generator => copyCurrentStructure(generator, parser) } -row(idx) = UTF8String.fromBytes(output.toByteArray) +val jsonValue = UTF8String.fromBytes(output.toByteArray) +row(idx) = jsonValue +idx = idx + 1 + +// SPARK-21804: json_tuple returns null values within repeated columns +// except the first one; so that we need to check the remaining fields. +while (idx < fieldNames.length) { + if (fieldNames(idx) == jsonField) { +row(idx) = jsonValue + } + idx = idx + 1 +} --- End diff -- We have followed @HyukjinKwon's suggestion to avoid functional transformation with a while, since this is a hot path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19017: [SPARK-21804][SQL] json_tuple returns null values...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19017#discussion_r134923130 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -447,7 +448,18 @@ case class JsonTuple(children: Seq[Expression]) generator => copyCurrentStructure(generator, parser) } -row(idx) = UTF8String.fromBytes(output.toByteArray) +val jsonValue = UTF8String.fromBytes(output.toByteArray) +row(idx) = jsonValue +idx = idx + 1 + +// SPARK-21804: json_tuple returns null values within repeated columns +// except the first one; so that we need to check the remaining fields. +while (idx < fieldNames.length) { + if (fieldNames(idx) == jsonField) { +row(idx) = jsonValue + } + idx = idx + 1 +} --- End diff -- Could you rewrite it in short? More Scala? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19032: [SPARK-17321][YARN] Avoid writing shuffle metadata to di...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19032 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19032: [SPARK-17321][YARN] Avoid writing shuffle metadata to di...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19032 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81067/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19032: [SPARK-17321][YARN] Avoid writing shuffle metadata to di...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19032 **[Test build #81067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81067/testReport)** for PR 19032 at commit [`5abbe75`](https://github.com/apache/spark/commit/5abbe75072cf3f172f0b2e448941b94d72268c90). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18730 **[Test build #81069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81069/testReport)** for PR 18730 at commit [`aeabe1d`](https://github.com/apache/spark/commit/aeabe1d1aacf5abf58d631bc291dd409728b5569). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19027 I'm ok without the test since this is unlikely to break in the future. We do have tests that depends on (optionally) numpy (and Arrow) - seems like we should be able to take on dependencies more formally so we could test them properly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18730 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81068/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18730 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18730 **[Test build #81068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81068/testReport)** for PR 18730 at commit [`4789772`](https://github.com/apache/spark/commit/478977293aadb9383740eabbaee23a43cc64b062). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18730 **[Test build #81068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81068/testReport)** for PR 18730 at commit [`4789772`](https://github.com/apache/spark/commit/478977293aadb9383740eabbaee23a43cc64b062). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19032: [SPARK-17321][YARN] Avoid writing shuffle metadata to di...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19032 **[Test build #81067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81067/testReport)** for PR 19032 at commit [`5abbe75`](https://github.com/apache/spark/commit/5abbe75072cf3f172f0b2e448941b94d72268c90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19032: [SPARK-17321][YARN] Avoid writing shuffle metadata to di...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19032 CC @lishuming please take a look at another approach to fix the bad disk issue. Also ping @tgravescs to view the PR. Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19031 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/18730 I mock some local test for two different api. From the simple test result, we can see slice will not affect the performance of write bytes.Test result below: ``` ãTest 10 chunks each with 30m for 1 loopã Time cost with 1 loop for writeFully(): 83 ms Time cost with 1 loop for writeWithSlice(): 76 ms ãEndingã ãTest 10 chunks each with 100m for 1 loopã Time cost with 1 loop for writeFully(): 219 ms Time cost with 1 loop for writeWithSlice(): 213 ms ãEndingã ãTest 10 chunks each with 30m for 10 loopã Time cost with 10 loop for writeFully(): 982 ms Time cost with 10 loop for writeWithSlice(): 1000 ms ãEndingã ãTest 10 chunks each with 100m for 10 loopã Time cost with 10 loop for writeFully(): 3298 ms Time cost with 10 loop for writeWithSlice(): 3454 ms ãEndingã ãTest 10 chunks each with 30m for 50 loopã Time cost with 50 loop for writeFully(): 3444 ms Time cost with 50 loop for writeWithSlice(): 3329 ms ãEndingã ãTest 10 chunks each with 100m for 50 loopã Time cost with 50 loop for writeFully(): 21913 ms Time cost with 50 loop for writeWithSlice(): 17574 ms ãEndingã ``` Test code below: ``` test("benchmark testing") { // scalastyle:off val buffer100 = ByteBuffer.allocate(1024 * 1024 * 100) val buffer30 = ByteBuffer.allocate(1024 * 1024 * 30) testWithLoop(1, new ChunkedByteBuffer(Array.fill(10)(buffer30)), "Test 10 chunks each with 30m for 1 loop") testWithLoop(1, new ChunkedByteBuffer(Array.fill(10)(buffer100)), "Test 10 chunks each with 100m for 1 loop") testWithLoop(10, new ChunkedByteBuffer(Array.fill(10)(buffer30)), "Test 10 chunks each with 30m for 10 loop") testWithLoop(10, new ChunkedByteBuffer(Array.fill(10)(buffer100)), "Test 10 chunks each with 100m for 10 loop") testWithLoop(50, new ChunkedByteBuffer(Array.fill(10)(buffer30)), "Test 10 chunks each with 30m for 50 loop") testWithLoop(50, new ChunkedByteBuffer(Array.fill(10)(buffer100)), "Test 10 chunks each with 100m for 50 loop") } // scalastyle:off private def testWithLoop(loopTimes : Int, chunkedByteBuffer: ChunkedByteBuffer, testString: String) { System.out.println(s"ã$testStringã") var starTime = System.currentTimeMillis() for (i <- 1 to loopTimes) { chunkedByteBuffer.writeFully(new ByteArrayWritableChannel(chunkedByteBuffer.size.toInt)) } System.out.println(s"Time cost with $loopTimes loop for writeFully():${Utils.getUsedTimeMs(starTime)}") starTime = System.currentTimeMillis() for (i <- 1 to loopTimes) { chunkedByteBuffer.writeWithSlice(new ByteArrayWritableChannel(chunkedByteBuffer.size.toInt)) } System.out.println(s"Time cost with $loopTimes loop for writeWithSlice():${Utils.getUsedTimeMs(starTime)}") System.out.println("ãEndingã") System.out.println("") } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19032: [SPARK-17321][YARN] Avoid writing shuffle metadat...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/19032 [SPARK-17321][YARN] Avoid writing shuffle metadata to disk if NM recovery is disabled ## What changes were proposed in this pull request? In the current code, if NM recovery is not enabled then `YarnShuffleService` will write shuffle metadata to NM local dir-1, if this local dir-1 is on bad disk, then `YarnShuffleService` will be failed to start. So to solve this issue, in Spark side if NM recovery is not enabled, then Spark will not persist data into leveldb, in that case yarn shuffle service can still be served but lose the ability for recovery, (it is fine because the failure of NM will kill the containers as well as applications). ## How was this patch tested? Tested in the local cluster with NM recovery off and on to see if folder is created or not. MiniCluster UT isn't added because in MiniCluster NM will always set port to 0, but NM recovery requires non-ephemeral port. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-17321 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19032 commit 5abbe75072cf3f172f0b2e448941b94d72268c90 Author: jerryshaoDate: 2017-08-24T03:28:48Z Avoid writing shuffle metadata to disk if NM recovery is disabled Change-Id: Id062d71589f46052706058c151c706dae38b1e6e --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19017: [SPARK-21804][SQL] json_tuple returns null values within...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19017 **[Test build #81066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81066/testReport)** for PR 19017 at commit [`ff01e04`](https://github.com/apache/spark/commit/ff01e04a8c9f1f8447c3b536f9288d3f6eaf62be). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19031 **[Test build #81065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81065/testReport)** for PR 19031 at commit [`a0854ad`](https://github.com/apache/spark/commit/a0854ad16003020eaa6f0d1f1c08db726b9196e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19017: [SPARK-21804][SQL] json_tuple returns null values within...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19017 LGTM too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19022: [Spark-21807][SQL]Override ++ operation in Expres...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19022 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18968#discussion_r134920136 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -274,17 +274,24 @@ object ScalarSubquery { case class ListQuery( plan: LogicalPlan, children: Seq[Expression] = Seq.empty, -exprId: ExprId = NamedExpression.newExprId) +exprId: ExprId = NamedExpression.newExprId, +childOutputs: Seq[Attribute] = Seq.empty) extends SubqueryExpression(plan, children, exprId) with Unevaluable { - override def dataType: DataType = plan.schema.fields.head.dataType + override def dataType: DataType = if (childOutputs.length > 1) { +childOutputs.toStructType + } else { +childOutputs.head.dataType + } + override lazy val resolved: Boolean = childrenResolved && plan.resolved && childOutputs.nonEmpty --- End diff -- Before we fill in `childOutputs`, this `ListQuery` cannot be resolved. Otherwise, to access its `dataType` causes failure in `In.checkInputDataTypes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19022: [Spark-21807][SQL]Override ++ operation in ExpressionSet...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19022 Thanks! Merging to master. You can fix this in your future PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable m...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19031#discussion_r134920085 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -577,9 +577,11 @@ object SQLConf { .doc("The maximum lines of a single Java function generated by whole-stage codegen. " + "When the generated function exceeds this threshold, " + "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - "The default value 4000 is the max length of byte code JIT supported " + - "for a single function(8000) divided by 2.") + "The default value 2667 is the max length of byte code JIT supported " + + "for a single function(8000) divided by 2. Use -1 to disable this.") .intConf +.checkValue(maxLines => maxLines >= -1, "The maximum must not be a negative integer, -1 to " + + "always activate whole-stage codegen.") --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable m...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19031#discussion_r134920074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -577,9 +577,11 @@ object SQLConf { .doc("The maximum lines of a single Java function generated by whole-stage codegen. " + "When the generated function exceeds this threshold, " + "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - "The default value 4000 is the max length of byte code JIT supported " + - "for a single function(8000) divided by 2.") + "The default value 2667 is the max length of byte code JIT supported " + --- End diff -- missed... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19022: [Spark-21807][SQL]Override ++ operation in Expres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19022#discussion_r134919964 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -210,4 +210,13 @@ class ExpressionSetSuite extends SparkFunSuite { assert((initialSet - (aLower + 1)).size == 0) } + + test("add multiple elements to set") { +val initialSet = ExpressionSet(aUpper + 1 :: Nil) +val setToAddWithSameExpression = ExpressionSet(aUpper + 1 :: aUpper + 2 :: Nil) +val setToAddWithOutSameExpression = ExpressionSet(aUpper + 3 :: aUpper + 4 :: Nil) --- End diff -- Nit: `WithOut` -> `Without` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19022: [Spark-21807][SQL]Override ++ operation in ExpressionSet...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19022 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19017: [SPARK-21804][SQL] json_tuple returns null values within...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19017 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18997: [SPARK-21788][SS]Handle more exceptions when stopping a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18997 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18997: [SPARK-21788][SS]Handle more exceptions when stopping a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18997 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81058/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18997: [SPARK-21788][SS]Handle more exceptions when stopping a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18997 **[Test build #81058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81058/testReport)** for PR 18997 at commit [`bbb0b0e`](https://github.com/apache/spark/commit/bbb0b0eb5a3517bb6c278588c2a66d4b6da8027f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19017: [SPARK-21804][SQL] json_tuple returns null values within...
Github user jmchung commented on the issue: https://github.com/apache/spark/pull/19017 @viirya PR title fixed, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19017: SPARK-21804: json_tuple returns null values within repea...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19017 Please edit the PR title as `[SPARK-21804][SQL] json_tuple returns ...`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19022: [Spark-21807][SQL]Override ++ operation in ExpressionSet...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19022 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19022: [Spark-21807][SQL]Override ++ operation in ExpressionSet...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19022 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81057/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19022: [Spark-21807][SQL]Override ++ operation in ExpressionSet...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19022 **[Test build #81057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81057/testReport)** for PR 19022 at commit [`0762840`](https://github.com/apache/spark/commit/07628402eeed958c45905974c82b06211f1bc934). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable m...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19031#discussion_r134918924 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -577,9 +577,11 @@ object SQLConf { .doc("The maximum lines of a single Java function generated by whole-stage codegen. " + "When the generated function exceeds this threshold, " + "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - "The default value 4000 is the max length of byte code JIT supported " + - "for a single function(8000) divided by 2.") + "The default value 2667 is the max length of byte code JIT supported " + --- End diff -- 2667? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable m...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19031#discussion_r13491 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -577,9 +577,11 @@ object SQLConf { .doc("The maximum lines of a single Java function generated by whole-stage codegen. " + "When the generated function exceeds this threshold, " + "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - "The default value 4000 is the max length of byte code JIT supported " + - "for a single function(8000) divided by 2.") + "The default value 2667 is the max length of byte code JIT supported " + + "for a single function(8000) divided by 2. Use -1 to disable this.") .intConf +.checkValue(maxLines => maxLines >= -1, "The maximum must not be a negative integer, -1 to " + + "always activate whole-stage codegen.") --- End diff -- `The maximum must not be a negative integer, except for -1 using to always activate whole-stage codegen.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19031 I'd prefer using `-1` to disable `maxLinesPerFunction` check like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18730 **[Test build #81064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81064/testReport)** for PR 18730 at commit [`72aef67`](https://github.com/apache/spark/commit/72aef679b498bb042ecb9ffa8df62ed41e1f519d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19028: [MINOR][SQL] The comment of Class ExchangeCoordinator ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19028 **[Test build #81063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81063/testReport)** for PR 19028 at commit [`837536f`](https://github.com/apache/spark/commit/837536fee8427f9b527ace401924f9a703ba38d7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19028: [MINOR][SQL] The comment of Class ExchangeCoordinator ex...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19028 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...
Github user vinodkc commented on the issue: https://github.com/apache/spark/pull/19008 @rxin , Sure, I'll update it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81054/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19013: [SPARK-21728][core] Allow SparkSubmit to use Logging.
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19013 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #81054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81054/testReport)** for PR 18652 at commit [`793dac4`](https://github.com/apache/spark/commit/793dac4403926fb9f1421f4bbee59a8e9b82d7e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18969: [SPARK-21520][SQL][FOLLOW-UP]fix a special case f...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18969#discussion_r134915918 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -24,6 +24,24 @@ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ /** + * A pattern that matches any number of project if fields is deterministic + * or child is LeafNode of project on top of another relational operator. + */ +object ProjectOperation extends PredicateHelper { + type ReturnType = (Seq[NamedExpression], LogicalPlan) + + def unapply(plan: LogicalPlan): Option[ReturnType] = plan match { +case Project(fields, child) if fields.forall(_.deterministic) => + Some((fields, child)) + +case Project(fields, child: LeafNode) => --- End diff -- Hi, @gatorsmile . Could you review again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19021: [SPARK-21603][SQL][FOLLOW-UP] Change the default value o...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19021 @maropu Should be a good idea. Especially the number of lines of code may not be intuitive to set for this purpose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19031 **[Test build #81062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81062/testReport)** for PR 19031 at commit [`9438655`](https://github.com/apache/spark/commit/94386550523baf5f98427d3ef0b9f9815cee4c69). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81055/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19031 **[Test build #81055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81055/testReport)** for PR 19031 at commit [`60dc64e`](https://github.com/apache/spark/commit/60dc64e3dc9ad5de5604ea68d3bb5cf7defc4553). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18730 **[Test build #81061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81061/testReport)** for PR 18730 at commit [`bab91db`](https://github.com/apache/spark/commit/bab91db933947b57159b21e5f6506570b6b721cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18730: [SPARK-21527][CORE] Use buffer limit in order to ...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/18730#discussion_r134912125 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -63,6 +65,19 @@ private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) { } /** + * Write this buffer to a channel with slice. + */ + def writeWithSlice(channel: WritableByteChannel): Unit = { +for (bytes <- getChunks()) { + val capacity = bytes.limit() + while (bytes.position() < capacity) { +bytes.limit(Math.min(capacity, bytes.position + NIO_BUFFER_LIMIT.toInt)) --- End diff -- Good review.I refactor the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81060 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81060/testReport)** for PR 18581 at commit [`47e8d37`](https://github.com/apache/spark/commit/47e8d3761681611a9ee6d50d6c812babd395dace). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18730: [SPARK-21527][CORE] Use buffer limit in order to use JAV...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/18730 @jiangxb1987 Ok,i will try to do some benchmark tesing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81059/testReport)** for PR 18581 at commit [`3555e5d`](https://github.com/apache/spark/commit/3555e5dafa85dcee404599c78b17cbb97b1709f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19021: [SPARK-21603][SQL][FOLLOW-UP] Change the default value o...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19021 Just for your info, again, I looked into this issue in TPC-DS quries; I added [some code](https://github.com/apache/spark/compare/master...maropu:SPARK-21603-FOLLOWUP-3) to check the actual bytecode size of these quries and I found the gen'd function in Q17/Q66 only had too-long bytecode over `8000`: ``` = TPCDS QUERY BENCHMARK OUTPUT FOR q17 = 17/08/23 14:45:02 WARN CodeGenerator: GeneratedClass.agg_doAggregateWithKeys is too large to do JIT compilation on HotSpot; the size of agg_doAggregateWithKeys is 17665; the limit is 8000 = TPCDS QUERY BENCHMARK OUTPUT FOR q66 = 17/08/23 14:55:39 WARN CodeGenerator: GeneratedClass.agg_doAggregateWithKeys is too large to do JIT compilation on HotSpot; the size of agg_doAggregateWithKeys is 11012; the limit is 8000 17/08/23 14:55:39 WARN CodeGenerator: GeneratedClass.agg_doAggregateWithKeys is too large to do JIT compilation on HotSpot; the size of agg_doAggregateWithKeys is 13420; the limit is 8000 17/08/23 14:55:39 WARN CodeGenerator: GeneratedClass.agg_doAggregateWithKeys is too large to do JIT compilation on HotSpot; the size of agg_doAggregateWithKeys is 16641; the limit is 8000 ``` BTW, why we don't check if gen'd bytecode size is over `8000` directly instead of code line num. in #18810? cc: @gatorsmile @viirya @kiszk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Join [t1.a = rand(t2.b), t1.c = rand(t2.d)] Sort Project [t1.a, t1.c] TableScan t1 Sort Project [rand(t2.b) as rand(t2.b), rand(t2.d) as rand(t2.d)] TableScan t2 Aren't `rand(t2.b)` and `rand(t2.d)` already evaluated in `Project`? Why `Sort` will change the evaluation order? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19027 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org