[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22615#discussion_r222069855 --- Diff: docs/building-spark.md --- @@ -49,25 +49,20 @@ To create a Spark distribution like those distributed by the to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured with Maven profile settings and so on like the direct Maven build. Example: -./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes +./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes This will build Spark distribution along with Python pip and R packages. For more information on usage, run `./dev/make-distribution.sh --help` ## Specifying the Hadoop Version and Enabling YARN You can specify the exact version of Hadoop to compile against through the `hadoop.version` property. -If unset, Spark will build against Hadoop 2.6.X by default. You can enable the `yarn` profile and optionally set the `yarn.version` property if it is different from `hadoop.version`. -Examples: +Example: -# Apache Hadoop 2.6.X -./build/mvn -Pyarn -DskipTests clean package - -# Apache Hadoop 2.7.X and later -./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package +./build/mvn -Pyarn -Dhadoop.version=2.7.7 -DskipTests clean package --- End diff -- SPARK-25015 is reverted by SPARK-25330. Is it safe to give an example with `2.7.7` without some warnings? ``` b0ada7dce0 [SPARK-25330][BUILD][BRANCH-2.3] Revert Hadoop 2.7 to 2.7.3 5f9633dc97 [SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22614 **[Test build #96868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96868/testReport)** for PR 22614 at commit [`dddffca`](https://github.com/apache/spark/commit/dddffcae8824e72d614fd6202e7fc562c490098b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22614 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22602: [SPARK-25538][SQL] Zero-out all bytes when writin...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22602#discussion_r222067876 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriterSuite.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.types.Decimal + +class UnsafeWriterSuite extends SparkFunSuite { --- End diff -- I don't think it is necessary, as we may want to include here also tests for other `UnsafeWriter` in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3637/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22615 **[Test build #96867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96867/testReport)** for PR 22615 at commit [`3b313bb`](https://github.com/apache/spark/commit/3b313bb83c84429b4d5840055523b1ca48489d19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22615 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3636/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22605 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22615 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...
Github user kmanamcheri commented on the issue: https://github.com/apache/spark/pull/22614 @mallman @cloud-fan @ericl @rezasafi --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/22615 [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6 ## What changes were proposed in this pull request? Remove Hadoop 2.6 references and make 2.7 the default. Obviously, this is for master/3.0.0 only. After this we can also get rid of the separate test jobs for Hadoop 2.6. ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-25016 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22615.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22615 commit 3b313bb83c84429b4d5840055523b1ca48489d19 Author: Sean Owen Date: 2018-10-02T18:37:40Z Remove Hadoop 2.6 references and make 2.7 the default --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22605 **[Test build #96866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96866/testReport)** for PR 22605 at commit [`6e917e5`](https://github.com/apache/spark/commit/6e917e57db33a92274e79de9906194fb650a9171). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22614 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22614: [SPARK-25561] [SQL] HiveClient.getPartitionsByFilter sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22614 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22614: [SPARK-25561] [SQL] HiveClient.getPartitionsByFilter sho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22614 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22606 Thanks, @srowen and @gatorsmile . That sounds good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: HiveClient.getPartitionsByFilter should not throw...
GitHub user kmanamcheri opened a pull request: https://github.com/apache/spark/pull/22614 HiveClient.getPartitionsByFilter should not throw an exception if HMS retries directSql ## What changes were proposed in this pull request? When using partition filter pushdown to HMS, Spark should expect a MetaException from HMS if partition filtering is not supported and should call getAllPartitions instead. HMS is expected to throw a MetaException even if directSql is enabled. ## How was this patch tested? Unit tests on the Spark SQL component were run successfully. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kmanamcheri/spark SPARK-25561 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22614.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22614 commit dddffcae8824e72d614fd6202e7fc562c490098b Author: Karthik Manamcheri Date: 2018-10-02T16:11:20Z HiveShim should expect MetaException from HMS in all cases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r222061609 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.kafka010 + +import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.security.token.{Token, TokenIdentifier} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ + +private[kafka010] object KafkaSecurityHelper extends Logging { + def getKeytabJaasParams(sparkConf: SparkConf): Option[String] = { +if (sparkConf.get(KEYTAB).nonEmpty) { + Some(getKrbJaasParams(sparkConf)) +} else { + None +} + } + + def getKrbJaasParams(sparkConf: SparkConf): String = { +val serviceName = sparkConf.get(KAFKA_KERBEROS_SERVICE_NAME) +require(serviceName.nonEmpty, "Kerberos service name must be defined") +val keytab = sparkConf.get(KEYTAB) +require(keytab.nonEmpty, "Keytab must be defined") +val principal = sparkConf.get(PRINCIPAL) +require(principal.nonEmpty, "Principal must be defined") + +val params = + s""" + |com.sun.security.auth.module.Krb5LoginModule required --- End diff -- There is a whole section in UGI code related to IBM JVMs changing the classnames of kerberos stuff. Just assume that any class with sun or oracle in the name will be different there and you won't be disappointed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22606 Yes. Let us wait for the release of 2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22602 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96859/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22602 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22602 **[Test build #96859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96859/testReport)** for PR 22602 at commit [`72b7c5c`](https://github.com/apache/spark/commit/72b7c5c16b33368bb332b168a99e99e13d4636cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22606 You mean http://spark.apache.org/versioning-policy.html and the reference to 2.4? I think that's still valid. When 2.4 is released, I'd propose to change that to refer to 3.0 being released .. I dunno .. around Mar 2019? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22488 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22488 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96864/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22500 **[Test build #96863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96863/testReport)** for PR 22500 at commit [`58fbdd6`](https://github.com/apache/spark/commit/58fbdd6ef0d51a8dc83b73a33e97563958dfae24). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22500 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96863/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96865/testReport)** for PR 22503 at commit [`695f676`](https://github.com/apache/spark/commit/695f6760e239a781ad8fb0b1e428944e73f79563). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22500 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22491 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96862/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22491 **[Test build #96862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96862/testReport)** for PR 22491 at commit [`1f492ee`](https://github.com/apache/spark/commit/1f492ee4dfde009bae23cc55ebd31061120b054e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222045619 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.mutable + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config._ + +private object ResourceRequestValidator { + private val ERROR_PREFIX: String = "Error:" + + /** + * Validates sparkConf and throws a SparkException if a standard resource (memory or cores) + * is defined with the property spark.yarn.x.resource.y + * + * Example of an invalid config: + * - spark.yarn.driver.resource.memory=2g + * + * Please note that if multiple resources are defined like described above, + * the error messages will be concatenated. + * Example of such a config: + * - spark.yarn.driver.resource.memory=2g + * - spark.yarn.executor.resource.cores=2 + * Then the following two error messages will be printed: + * - "memory cannot be requested with config spark.yarn.driver.resource.memory, + * please use config spark.driver.memory instead! + * - "cores cannot be requested with config spark.yarn.executor.resource.cores, + * please use config spark.executor.cores instead! + * + * @param sparkConf + */ + def validateResources(sparkConf: SparkConf): Unit = { +val resourceDefinitions = Seq[(String, String)]( + (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"), + (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"), + ("spark.driver.memory", YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory"), --- End diff -- There is `DRIVER_MEMORY` and `EXECUTOR_MEMORY`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222049295 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("Resource type args propagate, resource type not defined") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +val sparkConf = new SparkConf() + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "121m") +val args = new ClientArguments(Array()) + +val appContext = Records.newRecord(classOf[ApplicationSubmissionContext]) +val getNewApplicationResponse = Records.newRecord(classOf[GetNewApplicationResponse]) +val containerLaunchContext = Records.newRecord(classOf[ContainerLaunchContext]) + +val client = new Client(args, sparkConf) + +try { + client.createApplicationSubmissionContext( +new YarnClientApplication(getNewApplicationResponse, appContext), +containerLaunchContext) +} catch { + case NonFatal(e) => +val expectedExceptionClass = "org.apache.hadoop.yarn.exceptions.ResourceNotFoundException" +if (e.getClass.getName != expectedExceptionClass) { + fail(s"Exception caught: $e is not an instance of $expectedExceptionClass!") +} +} + } + + test("Resource type args propagate (client mode)") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu", "fpga")) + +val sparkConf = new SparkConf() + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "121m") --- End diff -- I'm a little confused, shouldn't this cause an exception like in the test above? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222049970 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("Resource type args propagate, resource type not defined") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +val sparkConf = new SparkConf() + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "121m") +val args = new ClientArguments(Array()) + +val appContext = Records.newRecord(classOf[ApplicationSubmissionContext]) +val getNewApplicationResponse = Records.newRecord(classOf[GetNewApplicationResponse]) +val containerLaunchContext = Records.newRecord(classOf[ContainerLaunchContext]) + +val client = new Client(args, sparkConf) + +try { + client.createApplicationSubmissionContext( +new YarnClientApplication(getNewApplicationResponse, appContext), +containerLaunchContext) +} catch { + case NonFatal(e) => +val expectedExceptionClass = "org.apache.hadoop.yarn.exceptions.ResourceNotFoundException" +if (e.getClass.getName != expectedExceptionClass) { + fail(s"Exception caught: $e is not an instance of $expectedExceptionClass!") +} +} + } + + test("Resource type args propagate (client mode)") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu", "fpga")) + +val sparkConf = new SparkConf() + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "121m") + .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "122m") + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "fpga", "222m") + .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "fpga", "223m") + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "memory", "1G") + .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory", "2G") +val args = new ClientArguments(Array()) + +val appContext = Records.newRecord(classOf[ApplicationSubmissionContext]) +val getNewApplicationResponse = Records.newRecord(classOf[GetNewApplicationResponse]) +val containerLaunchContext = Records.newRecord(classOf[ContainerLaunchContext]) + +val client = new Client(args, sparkConf) +client.createApplicationSubmissionContext( + new YarnClientApplication(getNewApplicationResponse, appContext), + containerLaunchContext) + +appContext.getAMContainerSpec should be (containerLaunchContext) +appContext.getApplicationType should be ("SPARK") + TestYarnResourceRequestHelper.getResourceTypeValue(appContext.getResource, + "some_resource_with_units_1") should be (121) +TestYarnResourceRequestHelper +.getResourceTypeValue(appContext.getResource, "fpga") should be (222) + } + + test("configuration and resource type args propagate (cluster mode)") { --- End diff -- This is basically copy & pasted from the above. Better to create a helper method with parameters for what to check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22606 It's about 3.0.0. So, I guess PMC knows the detail and the future. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222053715 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala --- @@ -134,6 +166,42 @@ class YarnAllocatorSuite extends SparkFunSuite with Matchers with BeforeAndAfter size should be (0) } + test("custom resource type requested from yarn") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu")) + +// request a single container and receive it +val handler = createAllocatorWithAdditionalConfigs(1, Map( + YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "gpu" -> "2G", + YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "memory" -> "1G" +)) +handler.updateResourceRequests() +handler.getNumExecutorsRunning should be (0) +handler.getPendingAllocate.size should be (1) + +val resource = Resource.newInstance(3072, 6) +val resourceTypes = Map("gpu" -> "2G") +ResourceRequestHelper.setResourceRequests(resourceTypes, resource) + +val container = createContainerWithResource("host1", resource) +handler.handleAllocatedContainers(Array(container)) + +// verify custom resource type is part of rmClient.ask set +val askField = rmClient.getClass.getDeclaredField("ask") --- End diff -- I'm not sure this is testing anything interesting. It's basically testing YARN code, not Spark code, and using reflection for that... The code you're actually adding to `YarnAllocator` is not being tested, which is the code that creates the `resource` field in that class. Instead, you're manually creating a `Resource` in this test. Testing that code would be better than what you have here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222052506 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/TestYarnResourceRequestHelper.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ListBuffer + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +object TestYarnResourceRequestHelper extends Logging { + def initializeResourceTypes(resourceTypes: List[String]): Unit = { +if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) { + throw new IllegalStateException("initializeResourceTypes() should not be invoked " + +"since YARN resource types is not available because of old Hadoop version!" ) +} + +val allResourceTypes = new ListBuffer[AnyRef] +val defaultResourceTypes = List( + createResourceTypeInfo("memory-mb"), + createResourceTypeInfo("vcores")) +val customResourceTypes = resourceTypes.map(rt => createResourceTypeInfo(rt)) +allResourceTypes ++= defaultResourceTypes +allResourceTypes ++= customResourceTypes + +reinitializeResources(allResourceTypes) + } + + private def createResourceTypeInfo(resourceName: String): AnyRef = { +val resTypeInfoClass = Utils.classForName("org.apache.hadoop.yarn.api.records.ResourceTypeInfo") +val resTypeInfoNewInstanceMethod = resTypeInfoClass.getMethod("newInstance", classOf[String]) +resTypeInfoNewInstanceMethod.invoke(null, resourceName) + } + + private def reinitializeResources(resourceTypes: ListBuffer[AnyRef]): Unit = { +val resourceUtilsClass = + Utils.classForName("org.apache.hadoop.yarn.util.resource.ResourceUtils") +val reinitializeResourcesMethod = resourceUtilsClass.getMethod("reinitializeResources", + classOf[java.util.List[AnyRef]]) +reinitializeResourcesMethod.invoke(null, resourceTypes.asJava) + } + + def getResourceTypeValue(res: Resource, name: String): AnyRef = { +val resourceInformation: AnyRef = getResourceInformation(res, name) +invokeMethod(resourceInformation, "getValue") + } + + def getResourceInformationByName(res: Resource, nameParam: String): ResourceInformation = { +val resourceInformation: AnyRef = getResourceInformation(res, nameParam) +val name = invokeMethod(resourceInformation, "getName").asInstanceOf[String] +val value = invokeMethod(resourceInformation, "getValue").asInstanceOf[Long] +val units = invokeMethod(resourceInformation, "getUnits").asInstanceOf[String] +new ResourceInformation(name, value, units) + } + + private def getResourceInformation(res: Resource, name: String) = { +if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) { + throw new IllegalStateException("assertResourceTypeValue() should not be invoked " + +"since yarn resource types is not available because of old Hadoop version!") +} + +val getResourceInformationMethod = res.getClass.getMethod("getResourceInformation", + classOf[String]) +val resourceInformation = getResourceInformationMethod.invoke(res, name) +resourceInformation + } + + private def invokeMethod(resourceInformation: AnyRef, methodName: String): AnyRef = { +val getValueMethod = resourceInformation.getClass.getMethod(methodName) +getValueMethod.invoke(resourceInformation) + } + + class ResourceInformation(val name: String, val value: Long, val units: String) { --- End diff -- Remove braces; make this a case class, then you don't need `val`s and can compare them more easily. `assert(resource === ResourceInformation("blah", 1L, "m"))` ---
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222048057 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("Resource type args propagate, resource type not defined") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +val sparkConf = new SparkConf() + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "121m") +val args = new ClientArguments(Array()) + +val appContext = Records.newRecord(classOf[ApplicationSubmissionContext]) +val getNewApplicationResponse = Records.newRecord(classOf[GetNewApplicationResponse]) +val containerLaunchContext = Records.newRecord(classOf[ContainerLaunchContext]) + +val client = new Client(args, sparkConf) + +try { + client.createApplicationSubmissionContext( +new YarnClientApplication(getNewApplicationResponse, appContext), +containerLaunchContext) +} catch { + case NonFatal(e) => +val expectedExceptionClass = "org.apache.hadoop.yarn.exceptions.ResourceNotFoundException" +if (e.getClass.getName != expectedExceptionClass) { --- End diff -- `assert(blah === blah2)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22491 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222043627 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.{Integer => JInteger, Long => JLong} +import java.lang.reflect.InvocationTargetException + +import scala.util.Try + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the YARN API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +private object ResourceRequestHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + + def setResourceRequests( + resources: Map[String, String], + resource: Resource): Unit = { +require(resource != null, "Resource parameter should not be null!") + +logDebug(s"Custom resource types requested: $resources") +if (!isYarnResourceTypesAvailable()) { + if (resources.nonEmpty) { +logWarning("Ignoring updating resource with resource types because " + +"the version of YARN does not support it!") + } + return +} + +val resInfoClass = Utils.classForName( + "org.apache.hadoop.yarn.api.records.ResourceInformation") +val setResourceInformationMethod = + resource.getClass.getMethod("setResourceInformation", classOf[String], +resInfoClass) --- End diff -- Fits in previous line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222043765 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.{Integer => JInteger, Long => JLong} +import java.lang.reflect.InvocationTargetException + +import scala.util.Try + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the YARN API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +private object ResourceRequestHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + + def setResourceRequests( + resources: Map[String, String], + resource: Resource): Unit = { +require(resource != null, "Resource parameter should not be null!") + +logDebug(s"Custom resource types requested: $resources") +if (!isYarnResourceTypesAvailable()) { + if (resources.nonEmpty) { +logWarning("Ignoring updating resource with resource types because " + +"the version of YARN does not support it!") + } + return +} + +val resInfoClass = Utils.classForName( + "org.apache.hadoop.yarn.api.records.ResourceInformation") --- End diff -- Fits in previous line. You also have the same class name hardcoded below, might as well create a constant. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222047253 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -35,13 +36,13 @@ import org.apache.hadoop.yarn.conf.YarnConfiguration import org.apache.hadoop.yarn.util.Records import org.mockito.Matchers.{eq => meq, _} import org.mockito.Mockito._ -import org.scalatest.Matchers +import org.scalatest.{BeforeAndAfterAll, Matchers} import org.apache.spark.{SparkConf, SparkFunSuite, TestUtils} import org.apache.spark.deploy.yarn.config._ import org.apache.spark.util.{SparkConfWithEnv, Utils} -class ClientSuite extends SparkFunSuite with Matchers { +class ClientSuite extends SparkFunSuite with Matchers with BeforeAndAfterAll { --- End diff -- Where are you using `BeforeAndAfterAll`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222044641 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.mutable + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config._ + +private object ResourceRequestValidator { + private val ERROR_PREFIX: String = "Error:" + + /** + * Validates sparkConf and throws a SparkException if a standard resource (memory or cores) + * is defined with the property spark.yarn.x.resource.y + * + * Example of an invalid config: + * - spark.yarn.driver.resource.memory=2g + * + * Please note that if multiple resources are defined like described above, + * the error messages will be concatenated. + * Example of such a config: + * - spark.yarn.driver.resource.memory=2g + * - spark.yarn.executor.resource.cores=2 + * Then the following two error messages will be printed: + * - "memory cannot be requested with config spark.yarn.driver.resource.memory, + * please use config spark.driver.memory instead! + * - "cores cannot be requested with config spark.yarn.executor.resource.cores, + * please use config spark.executor.cores instead! + * + * @param sparkConf + */ + def validateResources(sparkConf: SparkConf): Unit = { --- End diff -- This is now so small that it can be in the helper class above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222047755 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("Resource type args propagate, resource type not defined") { --- End diff -- I can't really parse what the test description means... (also in the other tests) Maybe you mean `test("resource request for invalid resource")`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222043939 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.{Integer => JInteger, Long => JLong} +import java.lang.reflect.InvocationTargetException + +import scala.util.Try + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the YARN API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +private object ResourceRequestHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + + def setResourceRequests( + resources: Map[String, String], + resource: Resource): Unit = { +require(resource != null, "Resource parameter should not be null!") + +logDebug(s"Custom resource types requested: $resources") +if (!isYarnResourceTypesAvailable()) { + if (resources.nonEmpty) { +logWarning("Ignoring updating resource with resource types because " + +"the version of YARN does not support it!") + } + return +} + +val resInfoClass = Utils.classForName( + "org.apache.hadoop.yarn.api.records.ResourceInformation") +val setResourceInformationMethod = + resource.getClass.getMethod("setResourceInformation", classOf[String], +resInfoClass) +resources.foreach { case (name, rawAmount) => + try { +val AMOUNT_AND_UNIT_REGEX(amountPart, unitPart) = rawAmount +val amount = amountPart.toLong +val unit = unitPart match { + case "g" => "G" + case "t" => "T" + case "p" => "P" + case _ => unitPart +} +logDebug(s"Registering resource with name: $name, amount: $amount, unit: $unit") +val resourceInformation = + createResourceInformation(name, amount, unit, resInfoClass) --- End diff -- Fits in previous line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222052098 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/TestYarnResourceRequestHelper.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ListBuffer + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +object TestYarnResourceRequestHelper extends Logging { + def initializeResourceTypes(resourceTypes: List[String]): Unit = { +if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) { + throw new IllegalStateException("initializeResourceTypes() should not be invoked " + +"since YARN resource types is not available because of old Hadoop version!" ) +} + +val allResourceTypes = new ListBuffer[AnyRef] +val defaultResourceTypes = List( + createResourceTypeInfo("memory-mb"), + createResourceTypeInfo("vcores")) +val customResourceTypes = resourceTypes.map(rt => createResourceTypeInfo(rt)) +allResourceTypes ++= defaultResourceTypes +allResourceTypes ++= customResourceTypes + +reinitializeResources(allResourceTypes) + } + + private def createResourceTypeInfo(resourceName: String): AnyRef = { +val resTypeInfoClass = Utils.classForName("org.apache.hadoop.yarn.api.records.ResourceTypeInfo") +val resTypeInfoNewInstanceMethod = resTypeInfoClass.getMethod("newInstance", classOf[String]) +resTypeInfoNewInstanceMethod.invoke(null, resourceName) + } + + private def reinitializeResources(resourceTypes: ListBuffer[AnyRef]): Unit = { +val resourceUtilsClass = + Utils.classForName("org.apache.hadoop.yarn.util.resource.ResourceUtils") +val reinitializeResourcesMethod = resourceUtilsClass.getMethod("reinitializeResources", + classOf[java.util.List[AnyRef]]) +reinitializeResourcesMethod.invoke(null, resourceTypes.asJava) + } + + def getResourceTypeValue(res: Resource, name: String): AnyRef = { +val resourceInformation: AnyRef = getResourceInformation(res, name) +invokeMethod(resourceInformation, "getValue") + } + + def getResourceInformationByName(res: Resource, nameParam: String): ResourceInformation = { +val resourceInformation: AnyRef = getResourceInformation(res, nameParam) +val name = invokeMethod(resourceInformation, "getName").asInstanceOf[String] +val value = invokeMethod(resourceInformation, "getValue").asInstanceOf[Long] +val units = invokeMethod(resourceInformation, "getUnits").asInstanceOf[String] +new ResourceInformation(name, value, units) + } + + private def getResourceInformation(res: Resource, name: String) = { --- End diff -- missing return type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222044918 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.mutable + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config._ + +private object ResourceRequestValidator { + private val ERROR_PREFIX: String = "Error:" + + /** + * Validates sparkConf and throws a SparkException if a standard resource (memory or cores) + * is defined with the property spark.yarn.x.resource.y --- End diff -- Scaladoc does not need HTML tags. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222051520 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/TestYarnResourceRequestHelper.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ListBuffer + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +object TestYarnResourceRequestHelper extends Logging { --- End diff -- Better name: `ResourceRequestTestHelper`. Also you're not using `Logging` at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222050707 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestValidatorSuite.scala --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.scalatest.{BeforeAndAfterAll, Matchers} + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} + +class ResourceRequestValidatorSuite extends SparkFunSuite with Matchers with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +super.beforeAll() + } + + test("empty SparkConf should be valid") { +val sparkConf = new SparkConf() +ResourceRequestValidator.validateResources(sparkConf) + } + + test("just normal resources are defined") { +val sparkConf = new SparkConf() +sparkConf.set("spark.driver.memory", "3G") +sparkConf.set("spark.driver.cores", "4") +sparkConf.set("spark.executor.memory", "4G") +sparkConf.set("spark.executor.cores", "2") + +ResourceRequestValidator.validateResources(sparkConf) + } + + test("Memory defined with new config for executor") { +val sparkConf = new SparkConf() +sparkConf.set("spark.yarn.executor.resource.memory", "30G") + +val thrown = intercept[SparkException] { + ResourceRequestValidator.validateResources(sparkConf) +} +thrown.getMessage should include ("spark.yarn.executor.resource.memory") + } + + test("Cores defined with new config for executor") { +val sparkConf = new SparkConf() +sparkConf.set("spark.yarn.executor.resource.cores", "5") + +val thrown = intercept[SparkException] { + ResourceRequestValidator.validateResources(sparkConf) +} +thrown.getMessage should include ("spark.yarn.executor.resource.cores") + } + + test("Memory defined with new config, client mode") { +val sparkConf = new SparkConf() +sparkConf.set("spark.yarn.am.resource.memory", "1G") + +val thrown = intercept[SparkException] { + ResourceRequestValidator.validateResources(sparkConf) +} +thrown.getMessage should include ("spark.yarn.am.resource.memory") + } + + test("Memory defined with new config for driver, cluster mode") { +val sparkConf = new SparkConf() +sparkConf.set("spark.yarn.driver.resource.memory", "1G") + +val thrown = intercept[SparkException] { + ResourceRequestValidator.validateResources(sparkConf) +} +thrown.getMessage should include ("spark.yarn.driver.resource.memory") + } + + test("Cores defined with new config, client mode") { +val sparkConf = new SparkConf() +sparkConf.set("spark.driver.memory", "2G") +sparkConf.set("spark.driver.cores", "4") +sparkConf.set("spark.executor.memory", "4G") +sparkConf.set("spark.yarn.am.cores", "2") + +sparkConf.set("spark.yarn.am.resource.cores", "3") + +val thrown = intercept[SparkException] { + ResourceRequestValidator.validateResources(sparkConf) +} +thrown.getMessage should include ("spark.yarn.am.resource.cores") + } + + test("Cores defined with new config for driver, cluster mode") { +val sparkConf = new SparkConf() +sparkConf.set("spark.yarn.driver.resource.cores", "1G") + +val thrown = intercept[SparkException] { + ResourceRequestValidator.validateResources(sparkConf) +} +thrown.getMessage should include ("spark.yarn.driver.resource.cores") + } + + test("various duplicated definitions") { +val sparkConf = new SparkConf() +sparkConf.set("spark.driver.memory", "2G") --- End diff -- You need to use existing constants all over this test code. --- - To
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222046191 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.mutable + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config._ + +private object ResourceRequestValidator { + private val ERROR_PREFIX: String = "Error:" + + /** + * Validates sparkConf and throws a SparkException if a standard resource (memory or cores) + * is defined with the property spark.yarn.x.resource.y + * + * Example of an invalid config: + * - spark.yarn.driver.resource.memory=2g + * + * Please note that if multiple resources are defined like described above, + * the error messages will be concatenated. + * Example of such a config: + * - spark.yarn.driver.resource.memory=2g + * - spark.yarn.executor.resource.cores=2 + * Then the following two error messages will be printed: + * - "memory cannot be requested with config spark.yarn.driver.resource.memory, + * please use config spark.driver.memory instead! + * - "cores cannot be requested with config spark.yarn.executor.resource.cores, + * please use config spark.executor.cores instead! + * + * @param sparkConf + */ + def validateResources(sparkConf: SparkConf): Unit = { +val resourceDefinitions = Seq[(String, String)]( + (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"), + (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"), + ("spark.driver.memory", YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory"), + (DRIVER_CORES.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "cores"), + ("spark.executor.memory", YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "memory"), + (EXECUTOR_CORES.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "cores")) +val errorMessage = new mutable.StringBuilder() + +resourceDefinitions.foreach { case (sparkName, resourceRequest) => + if (sparkConf.contains(resourceRequest)) { +errorMessage.append(s"$ERROR_PREFIX Do not use $resourceRequest, " + +s"please use $sparkName instead!\n") + } +} + +// throw exception after loop --- End diff -- Unnecessary comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222045360 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import scala.collection.mutable + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config._ + +private object ResourceRequestValidator { + private val ERROR_PREFIX: String = "Error:" + + /** + * Validates sparkConf and throws a SparkException if a standard resource (memory or cores) + * is defined with the property spark.yarn.x.resource.y + * + * Example of an invalid config: --- End diff -- Everything from here to the end of this comment is superfluous. Especially the part where you mention error messages that don't even match the code anymore... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222044125 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.{Integer => JInteger, Long => JLong} +import java.lang.reflect.InvocationTargetException + +import scala.util.Try + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the YARN API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +private object ResourceRequestHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + + def setResourceRequests( + resources: Map[String, String], + resource: Resource): Unit = { +require(resource != null, "Resource parameter should not be null!") + +logDebug(s"Custom resource types requested: $resources") +if (!isYarnResourceTypesAvailable()) { + if (resources.nonEmpty) { +logWarning("Ignoring updating resource with resource types because " + +"the version of YARN does not support it!") + } + return +} + +val resInfoClass = Utils.classForName( + "org.apache.hadoop.yarn.api.records.ResourceInformation") +val setResourceInformationMethod = + resource.getClass.getMethod("setResourceInformation", classOf[String], +resInfoClass) +resources.foreach { case (name, rawAmount) => + try { +val AMOUNT_AND_UNIT_REGEX(amountPart, unitPart) = rawAmount +val amount = amountPart.toLong +val unit = unitPart match { + case "g" => "G" + case "t" => "T" + case "p" => "P" + case _ => unitPart +} +logDebug(s"Registering resource with name: $name, amount: $amount, unit: $unit") +val resourceInformation = + createResourceInformation(name, amount, unit, resInfoClass) +setResourceInformationMethod.invoke( + resource, name, resourceInformation.asInstanceOf[AnyRef]) + } catch { +case _: MatchError => throw new IllegalArgumentException( --- End diff -- For multi-line statements, start the case in the next line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r222049404 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("Resource type args propagate, resource type not defined") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +val sparkConf = new SparkConf() + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "121m") +val args = new ClientArguments(Array()) + +val appContext = Records.newRecord(classOf[ApplicationSubmissionContext]) +val getNewApplicationResponse = Records.newRecord(classOf[GetNewApplicationResponse]) +val containerLaunchContext = Records.newRecord(classOf[ContainerLaunchContext]) + +val client = new Client(args, sparkConf) + +try { + client.createApplicationSubmissionContext( +new YarnClientApplication(getNewApplicationResponse, appContext), +containerLaunchContext) +} catch { + case NonFatal(e) => +val expectedExceptionClass = "org.apache.hadoop.yarn.exceptions.ResourceNotFoundException" +if (e.getClass.getName != expectedExceptionClass) { + fail(s"Exception caught: $e is not an instance of $expectedExceptionClass!") +} +} + } + + test("Resource type args propagate (client mode)") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu", "fpga")) + +val sparkConf = new SparkConf() + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "121m") + .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", "122m") + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "fpga", "222m") + .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "fpga", "223m") + .set(YARN_AM_RESOURCE_TYPES_PREFIX + "memory", "1G") --- End diff -- Shouldn't this cause an exception because of the validation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user justinuang commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r222053706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -212,6 +212,8 @@ class CSVOptions( settings.setEmptyValue(emptyValueInRead) settings.setMaxCharsPerColumn(maxCharsPerColumn) settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER) +settings.setLineSeparatorDetectionEnabled(multiLine) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22488 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3635/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22491 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22488 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22488 **[Test build #96864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96864/testReport)** for PR 22488 at commit [`7990e13`](https://github.com/apache/spark/commit/7990e139cf307c631eb95a1f7b410a85120b07c5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22500 **[Test build #96863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96863/testReport)** for PR 22500 at commit [`58fbdd6`](https://github.com/apache/spark/commit/58fbdd6ef0d51a8dc83b73a33e97563958dfae24). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22500 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3633/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22491 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3634/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22491 **[Test build #96862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96862/testReport)** for PR 22491 at commit [`1f492ee`](https://github.com/apache/spark/commit/1f492ee4dfde009bae23cc55ebd31061120b054e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22500 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22605 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3632/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22488 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/22503 @HyukjinKwon is this ready to be merged in, or is there more feedback to be addressed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22491 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22500 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22605 **[Test build #96861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96861/testReport)** for PR 22605 at commit [`fb116f0`](https://github.com/apache/spark/commit/fb116f0589471dc8d0b9787b308fc54fc5004cfc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22605 Rebased to `master` due to https://github.com/apache/spark/pull/22599. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure
Github user samdvr commented on the issue: https://github.com/apache/spark/pull/22596 Thanks will do! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure
Github user samdvr closed the pull request at: https://github.com/apache/spark/pull/22596 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22606 @dongjoon-hyun Could you help me fix the website? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22596 Since it's merged into `branch-2.2`, could you close this PR? GitHub only closes the PRs against `master` branch. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22596 BTW, I chose the most official one, `Sam Davarnia `. But, it seems that your GitHub is not happy with that. Next time, you had better be consistent in your commit message. ``` Lead-authored-by: Sam Davarnia Co-authored-by: Sam Davarnia <> Co-authored-by: Dongjoon Hyun Co-authored-by: Sam Davarnia Signed-off-by: Dongjoon Hyun ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22607: [SPARK-24530][followup] run Sphinx with python 3 ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22607 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22607: [SPARK-24530][followup] run Sphinx with python 3 in dock...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22607 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22599: [SPARK-25581][SQL] Rename method `benchmark` as `...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22599 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22609 @gengliangwang is right; there are still things that need to be done when the UI is disabled; job, stage and executor info are all still exposed even if the UI is disabled, and the task even callbacks update those. It might be as easy as not adding the task to `liveTasks` in `onTaskStart`, but haven't really looked in detail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 Thanks @viirya, your analysis is correct. Unfortunately an attribute doesn't have a reference to its dataset so I don't think this scenario can be solved easily. I believe the good solution would be something like https://github.com/apache/spark/pull/21449 But I would argue that my PR still does make sense and an `a("id") === b("id")` condition in a join where `c` is joined is not expected, and actually very likely a typo. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22606 Great, @gatorsmile ! Could you update Spark website properly, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22602: [SPARK-25538][SQL] Zero-out all bytes when writin...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22602#discussion_r222032210 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriterSuite.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.types.Decimal + +class UnsafeWriterSuite extends SparkFunSuite { --- End diff -- `UnsafeWriterSuite` -> `UnsafeRowWriterSuite`? Also, renaming the file? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22602 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96857/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22602 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22602 **[Test build #96857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96857/testReport)** for PR 22602 at commit [`64f4ed0`](https://github.com/apache/spark/commit/64f4ed0e286d1a01192400203e167d046cb800f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/22612 @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22602 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22596 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96858/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22596 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22596 **[Test build #96858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96858/testReport)** for PR 22596 at commit [`07156d4`](https://github.com/apache/spark/commit/07156d4d8d3ac40b47a825ecb4fa01bf7eaae1ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r222015287 --- Diff: python/pyspark/worker.py --- @@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type): arrow_return_type = to_arrow_type(return_type) def verify_result_length(*a): +import pyarrow as pa result = f(*a) if not hasattr(result, "__len__"): raise TypeError("Return type of the user-defined function should be " "Pandas.Series, but is {}".format(type(result))) if len(result) != len(a[0]): raise RuntimeError("Result vector from pandas_udf was not the required length: " "expected %d, got %d" % (len(a[0]), len(result))) + +# Ensure return type of Pandas.Series matches the arrow return type of the user-defined +# function. Otherwise, we may produce incorrect serialized data. +# Note: for timestamp type, we only need to ensure both types are timestamp because the +# serializer will do conversion. +try: +arrow_type_of_result = pa.from_numpy_dtype(result.dtype) +both_are_timestamp = pa.types.is_timestamp(arrow_type_of_result) and \ +pa.types.is_timestamp(arrow_return_type) +if not both_are_timestamp and arrow_return_type != arrow_type_of_result: +print("WARN: Arrow type %s of return Pandas.Series of the user-defined function's " --- End diff -- No, but as the other print usage in `worker.py`, I think this can be seen in the worker log? This is also useful when testing in pyspark shell. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22610 Thanks for looking into this @viirya ! I agree that there are lots of cases where casting to another type is intentional and works fine, so this isn't a bug. The only other idea I have is to provide an option to raise an error if the type needs to be cast. That might be possible with pyarrow right now, but I'm not sure how useful it would be. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22606 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r222007837 --- Diff: python/pyspark/worker.py --- @@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type): arrow_return_type = to_arrow_type(return_type) def verify_result_length(*a): +import pyarrow as pa result = f(*a) if not hasattr(result, "__len__"): raise TypeError("Return type of the user-defined function should be " "Pandas.Series, but is {}".format(type(result))) if len(result) != len(a[0]): raise RuntimeError("Result vector from pandas_udf was not the required length: " "expected %d, got %d" % (len(a[0]), len(result))) + +# Ensure return type of Pandas.Series matches the arrow return type of the user-defined +# function. Otherwise, we may produce incorrect serialized data. +# Note: for timestamp type, we only need to ensure both types are timestamp because the +# serializer will do conversion. +try: +arrow_type_of_result = pa.from_numpy_dtype(result.dtype) +both_are_timestamp = pa.types.is_timestamp(arrow_type_of_result) and \ +pa.types.is_timestamp(arrow_return_type) +if not both_are_timestamp and arrow_return_type != arrow_type_of_result: +print("WARN: Arrow type %s of return Pandas.Series of the user-defined function's " --- End diff -- Will this appear when being run in an executor? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22606 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22599: [SPARK-25581][SQL] Rename method `benchmark` as `runBenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96855/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22599: [SPARK-25581][SQL] Rename method `benchmark` as `runBenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22599: [SPARK-25581][SQL] Rename method `benchmark` as `runBenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22599 **[Test build #96855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96855/testReport)** for PR 22599 at commit [`1c15c25`](https://github.com/apache/spark/commit/1c15c25430b5084381c71215e4e2ea1f72f0af7c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22524 **[Test build #96860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96860/testReport)** for PR 22524 at commit [`1b2ab61`](https://github.com/apache/spark/commit/1b2ab6106f52558c48c8a82af6bda507b5189f64). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org