[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22615#discussion_r222069855
  
--- Diff: docs/building-spark.md ---
@@ -49,25 +49,20 @@ To create a Spark distribution like those distributed 
by the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
 
 This will build Spark distribution along with Python pip and R packages. 
For more information on usage, run `./dev/make-distribution.sh --help`
 
 ## Specifying the Hadoop Version and Enabling YARN
 
 You can specify the exact version of Hadoop to compile against through the 
`hadoop.version` property. 
-If unset, Spark will build against Hadoop 2.6.X by default.
 
 You can enable the `yarn` profile and optionally set the `yarn.version` 
property if it is different 
 from `hadoop.version`.
 
-Examples:
+Example:
 
-# Apache Hadoop 2.6.X
-./build/mvn -Pyarn -DskipTests clean package
-
-# Apache Hadoop 2.7.X and later
-./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests 
clean package
+./build/mvn -Pyarn -Dhadoop.version=2.7.7 -DskipTests clean package
--- End diff --

SPARK-25015 is reverted by SPARK-25330. Is it safe to give an example with 
`2.7.7` without some warnings?
```
b0ada7dce0 [SPARK-25330][BUILD][BRANCH-2.3] Revert Hadoop 2.7 to 2.7.3
5f9633dc97 [SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22614
  
**[Test build #96868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96868/testReport)**
 for PR 22614 at commit 
[`dddffca`](https://github.com/apache/spark/commit/dddffcae8824e72d614fd6202e7fc562c490098b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...

2018-10-02 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22614
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22602: [SPARK-25538][SQL] Zero-out all bytes when writin...

2018-10-02 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22602#discussion_r222067876
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriterSuite.scala
 ---
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.types.Decimal
+
+class UnsafeWriterSuite extends SparkFunSuite {
--- End diff --

I don't think it is necessary, as we may want to include here also tests 
for other `UnsafeWriter` in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3637/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22615
  
**[Test build #96867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96867/testReport)**
 for PR 22615 at commit 
[`3b313bb`](https://github.com/apache/spark/commit/3b313bb83c84429b4d5840055523b1ca48489d19).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22615
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3636/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22605
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22615
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...

2018-10-02 Thread kmanamcheri
Github user kmanamcheri commented on the issue:

https://github.com/apache/spark/pull/22614
  
@mallman @cloud-fan @ericl @rezasafi 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...

2018-10-02 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/22615

[SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

## What changes were proposed in this pull request?

Remove Hadoop 2.6 references and make 2.7 the default.
Obviously, this is for master/3.0.0 only.
After this we can also get rid of the separate test jobs for Hadoop 2.6.

## How was this patch tested?

Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-25016

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22615


commit 3b313bb83c84429b4d5840055523b1ca48489d19
Author: Sean Owen 
Date:   2018-10-02T18:37:40Z

Remove Hadoop 2.6 references and make 2.7 the default




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22605
  
**[Test build #96866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96866/testReport)**
 for PR 22605 at commit 
[`6e917e5`](https://github.com/apache/spark/commit/6e917e57db33a92274e79de9906194fb650a9171).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22614
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22614: [SPARK-25561] [SQL] HiveClient.getPartitionsByFilter sho...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22614
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22614: [SPARK-25561] [SQL] HiveClient.getPartitionsByFilter sho...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22614
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22606
  
Thanks, @srowen and @gatorsmile . That sounds good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: HiveClient.getPartitionsByFilter should not throw...

2018-10-02 Thread kmanamcheri
GitHub user kmanamcheri opened a pull request:

https://github.com/apache/spark/pull/22614

HiveClient.getPartitionsByFilter should not throw an exception if HMS 
retries directSql

## What changes were proposed in this pull request?

When using partition filter pushdown to HMS, Spark should expect a 
MetaException from HMS if
partition filtering is not supported and should call getAllPartitions 
instead. HMS is expected to
throw a MetaException even if directSql is enabled.

## How was this patch tested?

Unit tests on the Spark SQL component were run successfully.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kmanamcheri/spark SPARK-25561

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22614


commit dddffcae8824e72d614fd6202e7fc562c490098b
Author: Karthik Manamcheri 
Date:   2018-10-02T16:11:20Z

HiveShim should expect MetaException from HMS in all cases




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r222061609
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala
 ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import org.apache.hadoop.security.UserGroupInformation
+import org.apache.hadoop.security.token.{Token, TokenIdentifier}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+
+private[kafka010] object KafkaSecurityHelper extends Logging {
+  def getKeytabJaasParams(sparkConf: SparkConf): Option[String] = {
+if (sparkConf.get(KEYTAB).nonEmpty) {
+  Some(getKrbJaasParams(sparkConf))
+} else {
+  None
+}
+  }
+
+  def getKrbJaasParams(sparkConf: SparkConf): String = {
+val serviceName = sparkConf.get(KAFKA_KERBEROS_SERVICE_NAME)
+require(serviceName.nonEmpty, "Kerberos service name must be defined")
+val keytab = sparkConf.get(KEYTAB)
+require(keytab.nonEmpty, "Keytab must be defined")
+val principal = sparkConf.get(PRINCIPAL)
+require(principal.nonEmpty, "Principal must be defined")
+
+val params =
+  s"""
+  |com.sun.security.auth.module.Krb5LoginModule required
--- End diff --

There is a whole section in UGI code related to IBM JVMs changing the 
classnames of kerberos stuff. Just assume that any class with sun or oracle in 
the name will be different there and you won't be disappointed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22606
  
Yes. Let us wait for the release of 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22602
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96859/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22602
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22602
  
**[Test build #96859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96859/testReport)**
 for PR 22602 at commit 
[`72b7c5c`](https://github.com/apache/spark/commit/72b7c5c16b33368bb332b168a99e99e13d4636cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22606
  
You mean http://spark.apache.org/versioning-policy.html and the reference 
to 2.4? I think that's still valid. When 2.4 is released, I'd propose to change 
that to refer to 3.0 being released .. I dunno .. around Mar 2019?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22488
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22488
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96864/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22500
  
**[Test build #96863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96863/testReport)**
 for PR 22500 at commit 
[`58fbdd6`](https://github.com/apache/spark/commit/58fbdd6ef0d51a8dc83b73a33e97563958dfae24).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22500
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96863/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22503
  
**[Test build #96865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96865/testReport)**
 for PR 22503 at commit 
[`695f676`](https://github.com/apache/spark/commit/695f6760e239a781ad8fb0b1e428944e73f79563).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22500
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22491
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96862/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22491
  
**[Test build #96862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96862/testReport)**
 for PR 22491 at commit 
[`1f492ee`](https://github.com/apache/spark/commit/1f492ee4dfde009bae23cc55ebd31061120b054e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222045619
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+
+private object ResourceRequestValidator {
+  private val ERROR_PREFIX: String = "Error:"
+
+  /**
+   * Validates sparkConf and throws a SparkException if a standard 
resource (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   *
+   * Example of an invalid config:
+   * - spark.yarn.driver.resource.memory=2g
+   *
+   * Please note that if multiple resources are defined like described 
above,
+   * the error messages will be concatenated.
+   * Example of such a config:
+   * - spark.yarn.driver.resource.memory=2g
+   * - spark.yarn.executor.resource.cores=2
+   * Then the following two error messages will be printed:
+   * - "memory cannot be requested with config 
spark.yarn.driver.resource.memory,
+   * please use config spark.driver.memory instead!
+   * - "cores cannot be requested with config 
spark.yarn.executor.resource.cores,
+   * please use config spark.executor.cores instead!
+   *
+   * @param sparkConf
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
+  (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"),
+  ("spark.driver.memory", YARN_DRIVER_RESOURCE_TYPES_PREFIX + 
"memory"),
--- End diff --

There is `DRIVER_MEMORY` and  `EXECUTOR_MEMORY`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222049295
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("Resource type args propagate, resource type not defined") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+val sparkConf = new SparkConf()
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", 
"121m")
+val args = new ClientArguments(Array())
+
+val appContext = 
Records.newRecord(classOf[ApplicationSubmissionContext])
+val getNewApplicationResponse = 
Records.newRecord(classOf[GetNewApplicationResponse])
+val containerLaunchContext = 
Records.newRecord(classOf[ContainerLaunchContext])
+
+val client = new Client(args, sparkConf)
+
+try {
+  client.createApplicationSubmissionContext(
+new YarnClientApplication(getNewApplicationResponse, appContext),
+containerLaunchContext)
+} catch {
+  case NonFatal(e) =>
+val expectedExceptionClass = 
"org.apache.hadoop.yarn.exceptions.ResourceNotFoundException"
+if (e.getClass.getName != expectedExceptionClass) {
+  fail(s"Exception caught: $e is not an instance of 
$expectedExceptionClass!")
+}
+}
+  }
+
+  test("Resource type args propagate (client mode)") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu", 
"fpga"))
+
+val sparkConf = new SparkConf()
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", 
"121m")
--- End diff --

I'm a little confused, shouldn't this cause an exception like in the test 
above?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222049970
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("Resource type args propagate, resource type not defined") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+val sparkConf = new SparkConf()
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", 
"121m")
+val args = new ClientArguments(Array())
+
+val appContext = 
Records.newRecord(classOf[ApplicationSubmissionContext])
+val getNewApplicationResponse = 
Records.newRecord(classOf[GetNewApplicationResponse])
+val containerLaunchContext = 
Records.newRecord(classOf[ContainerLaunchContext])
+
+val client = new Client(args, sparkConf)
+
+try {
+  client.createApplicationSubmissionContext(
+new YarnClientApplication(getNewApplicationResponse, appContext),
+containerLaunchContext)
+} catch {
+  case NonFatal(e) =>
+val expectedExceptionClass = 
"org.apache.hadoop.yarn.exceptions.ResourceNotFoundException"
+if (e.getClass.getName != expectedExceptionClass) {
+  fail(s"Exception caught: $e is not an instance of 
$expectedExceptionClass!")
+}
+}
+  }
+
+  test("Resource type args propagate (client mode)") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu", 
"fpga"))
+
+val sparkConf = new SparkConf()
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", 
"121m")
+  .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + 
"some_resource_with_units_1", "122m")
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "fpga", "222m")
+  .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "fpga", "223m")
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "memory", "1G")
+  .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory", "2G")
+val args = new ClientArguments(Array())
+
+val appContext = 
Records.newRecord(classOf[ApplicationSubmissionContext])
+val getNewApplicationResponse = 
Records.newRecord(classOf[GetNewApplicationResponse])
+val containerLaunchContext = 
Records.newRecord(classOf[ContainerLaunchContext])
+
+val client = new Client(args, sparkConf)
+client.createApplicationSubmissionContext(
+  new YarnClientApplication(getNewApplicationResponse, appContext),
+  containerLaunchContext)
+
+appContext.getAMContainerSpec should be (containerLaunchContext)
+appContext.getApplicationType should be ("SPARK")
+
TestYarnResourceRequestHelper.getResourceTypeValue(appContext.getResource,
+  "some_resource_with_units_1") should be (121)
+TestYarnResourceRequestHelper
+.getResourceTypeValue(appContext.getResource, "fpga") should be 
(222)
+  }
+
+  test("configuration and resource type args propagate (cluster mode)") {
--- End diff --

This is basically copy & pasted from the above. Better to create a helper 
method with parameters for what to check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22606
  
It's about 3.0.0. So, I guess PMC knows the detail and the future. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222053715
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
 ---
@@ -134,6 +166,42 @@ class YarnAllocatorSuite extends SparkFunSuite with 
Matchers with BeforeAndAfter
 size should be (0)
   }
 
+  test("custom resource type requested from yarn") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu"))
+
+// request a single container and receive it
+val handler = createAllocatorWithAdditionalConfigs(1, Map(
+  YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "gpu" -> "2G",
+  YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "memory" -> "1G"
+))
+handler.updateResourceRequests()
+handler.getNumExecutorsRunning should be (0)
+handler.getPendingAllocate.size should be (1)
+
+val resource = Resource.newInstance(3072, 6)
+val resourceTypes = Map("gpu" -> "2G")
+ResourceRequestHelper.setResourceRequests(resourceTypes, resource)
+
+val container = createContainerWithResource("host1", resource)
+handler.handleAllocatedContainers(Array(container))
+
+// verify custom resource type is part of rmClient.ask set
+val askField = rmClient.getClass.getDeclaredField("ask")
--- End diff --

I'm not sure this is testing anything interesting. It's basically testing 
YARN code, not Spark code, and using reflection for that...

The code you're actually adding to `YarnAllocator` is not being tested, 
which is the code that creates the `resource` field in that class. Instead, 
you're manually creating a `Resource` in this test.

Testing that code would be better than what you have here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222052506
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/TestYarnResourceRequestHelper.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ListBuffer
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+object TestYarnResourceRequestHelper extends Logging {
+  def initializeResourceTypes(resourceTypes: List[String]): Unit = {
+if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) {
+  throw new IllegalStateException("initializeResourceTypes() should 
not be invoked " +
+"since YARN resource types is not available because of old Hadoop 
version!" )
+}
+
+val allResourceTypes = new ListBuffer[AnyRef]
+val defaultResourceTypes = List(
+  createResourceTypeInfo("memory-mb"),
+  createResourceTypeInfo("vcores"))
+val customResourceTypes = resourceTypes.map(rt => 
createResourceTypeInfo(rt))
+allResourceTypes ++= defaultResourceTypes
+allResourceTypes ++= customResourceTypes
+
+reinitializeResources(allResourceTypes)
+  }
+
+  private def createResourceTypeInfo(resourceName: String): AnyRef = {
+val resTypeInfoClass = 
Utils.classForName("org.apache.hadoop.yarn.api.records.ResourceTypeInfo")
+val resTypeInfoNewInstanceMethod = 
resTypeInfoClass.getMethod("newInstance", classOf[String])
+resTypeInfoNewInstanceMethod.invoke(null, resourceName)
+  }
+
+  private def reinitializeResources(resourceTypes: ListBuffer[AnyRef]): 
Unit = {
+val resourceUtilsClass =
+  
Utils.classForName("org.apache.hadoop.yarn.util.resource.ResourceUtils")
+val reinitializeResourcesMethod = 
resourceUtilsClass.getMethod("reinitializeResources",
+  classOf[java.util.List[AnyRef]])
+reinitializeResourcesMethod.invoke(null, resourceTypes.asJava)
+  }
+
+  def getResourceTypeValue(res: Resource, name: String): AnyRef = {
+val resourceInformation: AnyRef = getResourceInformation(res, name)
+invokeMethod(resourceInformation, "getValue")
+  }
+
+  def getResourceInformationByName(res: Resource, nameParam: String): 
ResourceInformation = {
+val resourceInformation: AnyRef = getResourceInformation(res, 
nameParam)
+val name = invokeMethod(resourceInformation, 
"getName").asInstanceOf[String]
+val value = invokeMethod(resourceInformation, 
"getValue").asInstanceOf[Long]
+val units = invokeMethod(resourceInformation, 
"getUnits").asInstanceOf[String]
+new ResourceInformation(name, value, units)
+  }
+
+  private def getResourceInformation(res: Resource, name: String) = {
+if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) {
+  throw new IllegalStateException("assertResourceTypeValue() should 
not be invoked " +
+"since yarn resource types is not available because of old Hadoop 
version!")
+}
+
+val getResourceInformationMethod = 
res.getClass.getMethod("getResourceInformation",
+  classOf[String])
+val resourceInformation = getResourceInformationMethod.invoke(res, 
name)
+resourceInformation
+  }
+
+  private def invokeMethod(resourceInformation: AnyRef, methodName: 
String): AnyRef = {
+val getValueMethod = resourceInformation.getClass.getMethod(methodName)
+getValueMethod.invoke(resourceInformation)
+  }
+
+  class ResourceInformation(val name: String, val value: Long, val units: 
String) {
--- End diff --

Remove braces; make this a case class, then you don't need `val`s and can 
compare them more easily.

`assert(resource === ResourceInformation("blah", 1L, "m"))`


---


[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222048057
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("Resource type args propagate, resource type not defined") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+val sparkConf = new SparkConf()
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", 
"121m")
+val args = new ClientArguments(Array())
+
+val appContext = 
Records.newRecord(classOf[ApplicationSubmissionContext])
+val getNewApplicationResponse = 
Records.newRecord(classOf[GetNewApplicationResponse])
+val containerLaunchContext = 
Records.newRecord(classOf[ContainerLaunchContext])
+
+val client = new Client(args, sparkConf)
+
+try {
+  client.createApplicationSubmissionContext(
+new YarnClientApplication(getNewApplicationResponse, appContext),
+containerLaunchContext)
+} catch {
+  case NonFatal(e) =>
+val expectedExceptionClass = 
"org.apache.hadoop.yarn.exceptions.ResourceNotFoundException"
+if (e.getClass.getName != expectedExceptionClass) {
--- End diff --

`assert(blah === blah2)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22491
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222043627
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Integer => JInteger, Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+
+  def setResourceRequests(
+  resources: Map[String, String],
+  resource: Resource): Unit = {
+require(resource != null, "Resource parameter should not be null!")
+
+logDebug(s"Custom resource types requested: $resources")
+if (!isYarnResourceTypesAvailable()) {
+  if (resources.nonEmpty) {
+logWarning("Ignoring updating resource with resource types because 
" +
+"the version of YARN does not support it!")
+  }
+  return
+}
+
+val resInfoClass = Utils.classForName(
+  "org.apache.hadoop.yarn.api.records.ResourceInformation")
+val setResourceInformationMethod =
+  resource.getClass.getMethod("setResourceInformation", 
classOf[String],
+resInfoClass)
--- End diff --

Fits in previous line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222043765
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Integer => JInteger, Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+
+  def setResourceRequests(
+  resources: Map[String, String],
+  resource: Resource): Unit = {
+require(resource != null, "Resource parameter should not be null!")
+
+logDebug(s"Custom resource types requested: $resources")
+if (!isYarnResourceTypesAvailable()) {
+  if (resources.nonEmpty) {
+logWarning("Ignoring updating resource with resource types because 
" +
+"the version of YARN does not support it!")
+  }
+  return
+}
+
+val resInfoClass = Utils.classForName(
+  "org.apache.hadoop.yarn.api.records.ResourceInformation")
--- End diff --

Fits in previous line.

You also have the same class name hardcoded below, might as well create a 
constant.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222047253
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -35,13 +36,13 @@ import org.apache.hadoop.yarn.conf.YarnConfiguration
 import org.apache.hadoop.yarn.util.Records
 import org.mockito.Matchers.{eq => meq, _}
 import org.mockito.Mockito._
-import org.scalatest.Matchers
+import org.scalatest.{BeforeAndAfterAll, Matchers}
 
 import org.apache.spark.{SparkConf, SparkFunSuite, TestUtils}
 import org.apache.spark.deploy.yarn.config._
 import org.apache.spark.util.{SparkConfWithEnv, Utils}
 
-class ClientSuite extends SparkFunSuite with Matchers {
+class ClientSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterAll {
--- End diff --

Where are you using `BeforeAndAfterAll`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222044641
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+
+private object ResourceRequestValidator {
+  private val ERROR_PREFIX: String = "Error:"
+
+  /**
+   * Validates sparkConf and throws a SparkException if a standard 
resource (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   *
+   * Example of an invalid config:
+   * - spark.yarn.driver.resource.memory=2g
+   *
+   * Please note that if multiple resources are defined like described 
above,
+   * the error messages will be concatenated.
+   * Example of such a config:
+   * - spark.yarn.driver.resource.memory=2g
+   * - spark.yarn.executor.resource.cores=2
+   * Then the following two error messages will be printed:
+   * - "memory cannot be requested with config 
spark.yarn.driver.resource.memory,
+   * please use config spark.driver.memory instead!
+   * - "cores cannot be requested with config 
spark.yarn.executor.resource.cores,
+   * please use config spark.executor.cores instead!
+   *
+   * @param sparkConf
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
--- End diff --

This is now so small that it can be in the helper class above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222047755
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("Resource type args propagate, resource type not defined") {
--- End diff --

I can't really parse what the test description means... (also in the other 
tests)

Maybe you mean `test("resource request for invalid resource")`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222043939
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Integer => JInteger, Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+
+  def setResourceRequests(
+  resources: Map[String, String],
+  resource: Resource): Unit = {
+require(resource != null, "Resource parameter should not be null!")
+
+logDebug(s"Custom resource types requested: $resources")
+if (!isYarnResourceTypesAvailable()) {
+  if (resources.nonEmpty) {
+logWarning("Ignoring updating resource with resource types because 
" +
+"the version of YARN does not support it!")
+  }
+  return
+}
+
+val resInfoClass = Utils.classForName(
+  "org.apache.hadoop.yarn.api.records.ResourceInformation")
+val setResourceInformationMethod =
+  resource.getClass.getMethod("setResourceInformation", 
classOf[String],
+resInfoClass)
+resources.foreach { case (name, rawAmount) =>
+  try {
+val AMOUNT_AND_UNIT_REGEX(amountPart, unitPart) = rawAmount
+val amount = amountPart.toLong
+val unit = unitPart match {
+  case "g" => "G"
+  case "t" => "T"
+  case "p" => "P"
+  case _ => unitPart
+}
+logDebug(s"Registering resource with name: $name, amount: $amount, 
unit: $unit")
+val resourceInformation =
+  createResourceInformation(name, amount, unit, resInfoClass)
--- End diff --

Fits in previous line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222052098
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/TestYarnResourceRequestHelper.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ListBuffer
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+object TestYarnResourceRequestHelper extends Logging {
+  def initializeResourceTypes(resourceTypes: List[String]): Unit = {
+if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) {
+  throw new IllegalStateException("initializeResourceTypes() should 
not be invoked " +
+"since YARN resource types is not available because of old Hadoop 
version!" )
+}
+
+val allResourceTypes = new ListBuffer[AnyRef]
+val defaultResourceTypes = List(
+  createResourceTypeInfo("memory-mb"),
+  createResourceTypeInfo("vcores"))
+val customResourceTypes = resourceTypes.map(rt => 
createResourceTypeInfo(rt))
+allResourceTypes ++= defaultResourceTypes
+allResourceTypes ++= customResourceTypes
+
+reinitializeResources(allResourceTypes)
+  }
+
+  private def createResourceTypeInfo(resourceName: String): AnyRef = {
+val resTypeInfoClass = 
Utils.classForName("org.apache.hadoop.yarn.api.records.ResourceTypeInfo")
+val resTypeInfoNewInstanceMethod = 
resTypeInfoClass.getMethod("newInstance", classOf[String])
+resTypeInfoNewInstanceMethod.invoke(null, resourceName)
+  }
+
+  private def reinitializeResources(resourceTypes: ListBuffer[AnyRef]): 
Unit = {
+val resourceUtilsClass =
+  
Utils.classForName("org.apache.hadoop.yarn.util.resource.ResourceUtils")
+val reinitializeResourcesMethod = 
resourceUtilsClass.getMethod("reinitializeResources",
+  classOf[java.util.List[AnyRef]])
+reinitializeResourcesMethod.invoke(null, resourceTypes.asJava)
+  }
+
+  def getResourceTypeValue(res: Resource, name: String): AnyRef = {
+val resourceInformation: AnyRef = getResourceInformation(res, name)
+invokeMethod(resourceInformation, "getValue")
+  }
+
+  def getResourceInformationByName(res: Resource, nameParam: String): 
ResourceInformation = {
+val resourceInformation: AnyRef = getResourceInformation(res, 
nameParam)
+val name = invokeMethod(resourceInformation, 
"getName").asInstanceOf[String]
+val value = invokeMethod(resourceInformation, 
"getValue").asInstanceOf[Long]
+val units = invokeMethod(resourceInformation, 
"getUnits").asInstanceOf[String]
+new ResourceInformation(name, value, units)
+  }
+
+  private def getResourceInformation(res: Resource, name: String) = {
--- End diff --

missing return type


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222044918
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+
+private object ResourceRequestValidator {
+  private val ERROR_PREFIX: String = "Error:"
+
+  /**
+   * Validates sparkConf and throws a SparkException if a standard 
resource (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
--- End diff --

Scaladoc does not need HTML tags.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222051520
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/TestYarnResourceRequestHelper.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ListBuffer
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+object TestYarnResourceRequestHelper extends Logging {
--- End diff --

Better name: `ResourceRequestTestHelper`. Also you're not using `Logging` 
at all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222050707
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestValidatorSuite.scala
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.scalatest.{BeforeAndAfterAll, Matchers}
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+
+class ResourceRequestValidatorSuite extends SparkFunSuite with Matchers 
with BeforeAndAfterAll {
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+  }
+
+  test("empty SparkConf should be valid") {
+val sparkConf = new SparkConf()
+ResourceRequestValidator.validateResources(sparkConf)
+  }
+
+  test("just normal resources are defined") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.driver.memory", "3G")
+sparkConf.set("spark.driver.cores", "4")
+sparkConf.set("spark.executor.memory", "4G")
+sparkConf.set("spark.executor.cores", "2")
+
+ResourceRequestValidator.validateResources(sparkConf)
+  }
+
+  test("Memory defined with new config for executor") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.yarn.executor.resource.memory", "30G")
+
+val thrown = intercept[SparkException] {
+  ResourceRequestValidator.validateResources(sparkConf)
+}
+thrown.getMessage should include 
("spark.yarn.executor.resource.memory")
+  }
+
+  test("Cores defined with new config for executor") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.yarn.executor.resource.cores", "5")
+
+val thrown = intercept[SparkException] {
+  ResourceRequestValidator.validateResources(sparkConf)
+}
+thrown.getMessage should include ("spark.yarn.executor.resource.cores")
+  }
+
+  test("Memory defined with new config, client mode") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.yarn.am.resource.memory", "1G")
+
+val thrown = intercept[SparkException] {
+  ResourceRequestValidator.validateResources(sparkConf)
+}
+thrown.getMessage should include ("spark.yarn.am.resource.memory")
+  }
+
+  test("Memory defined with new config for driver, cluster mode") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.yarn.driver.resource.memory", "1G")
+
+val thrown = intercept[SparkException] {
+  ResourceRequestValidator.validateResources(sparkConf)
+}
+thrown.getMessage should include ("spark.yarn.driver.resource.memory")
+  }
+
+  test("Cores defined with new config, client mode") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.driver.memory", "2G")
+sparkConf.set("spark.driver.cores", "4")
+sparkConf.set("spark.executor.memory", "4G")
+sparkConf.set("spark.yarn.am.cores", "2")
+
+sparkConf.set("spark.yarn.am.resource.cores", "3")
+
+val thrown = intercept[SparkException] {
+  ResourceRequestValidator.validateResources(sparkConf)
+}
+thrown.getMessage should include ("spark.yarn.am.resource.cores")
+  }
+
+  test("Cores defined with new config for driver, cluster mode") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.yarn.driver.resource.cores", "1G")
+
+val thrown = intercept[SparkException] {
+  ResourceRequestValidator.validateResources(sparkConf)
+}
+thrown.getMessage should include ("spark.yarn.driver.resource.cores")
+  }
+
+  test("various duplicated definitions") {
+val sparkConf = new SparkConf()
+sparkConf.set("spark.driver.memory", "2G")
--- End diff --

You need to use existing constants all over this test code.


---

-
To 

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222046191
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+
+private object ResourceRequestValidator {
+  private val ERROR_PREFIX: String = "Error:"
+
+  /**
+   * Validates sparkConf and throws a SparkException if a standard 
resource (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   *
+   * Example of an invalid config:
+   * - spark.yarn.driver.resource.memory=2g
+   *
+   * Please note that if multiple resources are defined like described 
above,
+   * the error messages will be concatenated.
+   * Example of such a config:
+   * - spark.yarn.driver.resource.memory=2g
+   * - spark.yarn.executor.resource.cores=2
+   * Then the following two error messages will be printed:
+   * - "memory cannot be requested with config 
spark.yarn.driver.resource.memory,
+   * please use config spark.driver.memory instead!
+   * - "cores cannot be requested with config 
spark.yarn.executor.resource.cores,
+   * please use config spark.executor.cores instead!
+   *
+   * @param sparkConf
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
+  (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"),
+  ("spark.driver.memory", YARN_DRIVER_RESOURCE_TYPES_PREFIX + 
"memory"),
+  (DRIVER_CORES.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "cores"),
+  ("spark.executor.memory", YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + 
"memory"),
+  (EXECUTOR_CORES.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "cores"))
+val errorMessage = new mutable.StringBuilder()
+
+resourceDefinitions.foreach { case (sparkName, resourceRequest) =>
+  if (sparkConf.contains(resourceRequest)) {
+errorMessage.append(s"$ERROR_PREFIX Do not use $resourceRequest, " 
+
+s"please use $sparkName instead!\n")
+  }
+}
+
+// throw exception after loop
--- End diff --

Unnecessary comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222045360
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestValidator.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+
+private object ResourceRequestValidator {
+  private val ERROR_PREFIX: String = "Error:"
+
+  /**
+   * Validates sparkConf and throws a SparkException if a standard 
resource (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   *
+   * Example of an invalid config:
--- End diff --

Everything from here to the end of this comment is superfluous. Especially 
the part where you mention error messages that don't even match the code 
anymore...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222044125
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Integer => JInteger, Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+
+  def setResourceRequests(
+  resources: Map[String, String],
+  resource: Resource): Unit = {
+require(resource != null, "Resource parameter should not be null!")
+
+logDebug(s"Custom resource types requested: $resources")
+if (!isYarnResourceTypesAvailable()) {
+  if (resources.nonEmpty) {
+logWarning("Ignoring updating resource with resource types because 
" +
+"the version of YARN does not support it!")
+  }
+  return
+}
+
+val resInfoClass = Utils.classForName(
+  "org.apache.hadoop.yarn.api.records.ResourceInformation")
+val setResourceInformationMethod =
+  resource.getClass.getMethod("setResourceInformation", 
classOf[String],
+resInfoClass)
+resources.foreach { case (name, rawAmount) =>
+  try {
+val AMOUNT_AND_UNIT_REGEX(amountPart, unitPart) = rawAmount
+val amount = amountPart.toLong
+val unit = unitPart match {
+  case "g" => "G"
+  case "t" => "T"
+  case "p" => "P"
+  case _ => unitPart
+}
+logDebug(s"Registering resource with name: $name, amount: $amount, 
unit: $unit")
+val resourceInformation =
+  createResourceInformation(name, amount, unit, resInfoClass)
+setResourceInformationMethod.invoke(
+  resource, name, resourceInformation.asInstanceOf[AnyRef])
+  } catch {
+case _: MatchError => throw new IllegalArgumentException(
--- End diff --

For multi-line statements, start the case in the next line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r222049404
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,92 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("Resource type args propagate, resource type not defined") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+val sparkConf = new SparkConf()
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", 
"121m")
+val args = new ClientArguments(Array())
+
+val appContext = 
Records.newRecord(classOf[ApplicationSubmissionContext])
+val getNewApplicationResponse = 
Records.newRecord(classOf[GetNewApplicationResponse])
+val containerLaunchContext = 
Records.newRecord(classOf[ContainerLaunchContext])
+
+val client = new Client(args, sparkConf)
+
+try {
+  client.createApplicationSubmissionContext(
+new YarnClientApplication(getNewApplicationResponse, appContext),
+containerLaunchContext)
+} catch {
+  case NonFatal(e) =>
+val expectedExceptionClass = 
"org.apache.hadoop.yarn.exceptions.ResourceNotFoundException"
+if (e.getClass.getName != expectedExceptionClass) {
+  fail(s"Exception caught: $e is not an instance of 
$expectedExceptionClass!")
+}
+}
+  }
+
+  test("Resource type args propagate (client mode)") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+TestYarnResourceRequestHelper.initializeResourceTypes(List("gpu", 
"fpga"))
+
+val sparkConf = new SparkConf()
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "some_resource_with_units_1", 
"121m")
+  .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + 
"some_resource_with_units_1", "122m")
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "fpga", "222m")
+  .set(YARN_DRIVER_RESOURCE_TYPES_PREFIX + "fpga", "223m")
+  .set(YARN_AM_RESOURCE_TYPES_PREFIX + "memory", "1G")
--- End diff --

Shouldn't this cause an exception because of the validation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-02 Thread justinuang
Github user justinuang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r222053706
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -212,6 +212,8 @@ class CSVOptions(
 settings.setEmptyValue(emptyValueInRead)
 settings.setMaxCharsPerColumn(maxCharsPerColumn)
 
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
+settings.setLineSeparatorDetectionEnabled(multiLine)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3635/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22491
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22488
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22488
  
**[Test build #96864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96864/testReport)**
 for PR 22488 at commit 
[`7990e13`](https://github.com/apache/spark/commit/7990e139cf307c631eb95a1f7b410a85120b07c5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22500
  
**[Test build #96863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96863/testReport)**
 for PR 22500 at commit 
[`58fbdd6`](https://github.com/apache/spark/commit/58fbdd6ef0d51a8dc83b73a33e97563958dfae24).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22500
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3633/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22491
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3634/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22491
  
**[Test build #96862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96862/testReport)**
 for PR 22491 at commit 
[`1f492ee`](https://github.com/apache/spark/commit/1f492ee4dfde009bae23cc55ebd31061120b054e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22500
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22605
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3632/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22488
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

2018-10-02 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22503
  
@HyukjinKwon is this ready to be merged in, or is there more feedback to be 
addressed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22491
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use main m...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22500
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22605
  
**[Test build #96861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96861/testReport)**
 for PR 22605 at commit 
[`fb116f0`](https://github.com/apache/spark/commit/fb116f0589471dc8d0b9787b308fc54fc5004cfc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22605: [SPARK-25589][SQL][TEST] Add BloomFilterBenchmark

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22605
  
Rebased to `master` due to https://github.com/apache/spark/pull/22599.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure

2018-10-02 Thread samdvr
Github user samdvr commented on the issue:

https://github.com/apache/spark/pull/22596
  
Thanks will do!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure

2018-10-02 Thread samdvr
Github user samdvr closed the pull request at:

https://github.com/apache/spark/pull/22596


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22606
  
@dongjoon-hyun Could you help me fix the website?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22596
  
Since it's merged into `branch-2.2`, could you close this PR? GitHub only 
closes the PRs against `master` branch. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22596
  
BTW, I chose the most official one, `Sam Davarnia `. 
But, it seems that your GitHub is not happy with that. Next time, you had 
better be consistent in your commit message.
```
Lead-authored-by: Sam Davarnia 
Co-authored-by: Sam Davarnia <>
Co-authored-by: Dongjoon Hyun 
Co-authored-by: Sam Davarnia 
Signed-off-by: Dongjoon Hyun 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22607: [SPARK-24530][followup] run Sphinx with python 3 ...

2018-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22607


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22607: [SPARK-24530][followup] run Sphinx with python 3 in dock...

2018-10-02 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22607
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22599: [SPARK-25581][SQL] Rename method `benchmark` as `...

2018-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22599


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22609
  
@gengliangwang is right; there are still things that need to be done when 
the UI is disabled; job, stage and executor info are all still exposed even if 
the UI is disabled, and the task even callbacks update those.

It might be as easy as not adding the task to `liveTasks` in `onTaskStart`, 
but haven't really looked in detail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-02 Thread peter-toth
Github user peter-toth commented on the issue:

https://github.com/apache/spark/pull/22318
  
Thanks @viirya, your analysis is correct.

Unfortunately an attribute doesn't have a reference to its dataset so I 
don't think this scenario can be solved easily. I believe the good solution 
would be something like https://github.com/apache/spark/pull/21449 

But I would argue that my PR still does make sense and an `a("id") === 
b("id")` condition in a join where `c` is joined is not expected, and actually 
very likely a typo.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22606
  
Great, @gatorsmile ! Could you update Spark website properly, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22602: [SPARK-25538][SQL] Zero-out all bytes when writin...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22602#discussion_r222032210
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriterSuite.scala
 ---
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.types.Decimal
+
+class UnsafeWriterSuite extends SparkFunSuite {
--- End diff --

`UnsafeWriterSuite` -> `UnsafeRowWriterSuite`? Also, renaming the file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22602
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96857/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22602
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22602
  
**[Test build #96857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96857/testReport)**
 for PR 22602 at commit 
[`64f4ed0`](https://github.com/apache/spark/commit/64f4ed0e286d1a01192400203e167d046cb800f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-02 Thread rezasafi
Github user rezasafi commented on the issue:

https://github.com/apache/spark/pull/22612
  
@squito


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22602
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96858/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22596
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22596: [SPARK-25576][BUILD][BRANCH-2.2] Fix lint failure

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22596
  
**[Test build #96858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96858/testReport)**
 for PR 22596 at commit 
[`07156d4`](https://github.com/apache/spark/commit/07156d4d8d3ac40b47a825ecb4fa01bf7eaae1ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22610#discussion_r222015287
  
--- Diff: python/pyspark/worker.py ---
@@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type):
 arrow_return_type = to_arrow_type(return_type)
 
 def verify_result_length(*a):
+import pyarrow as pa
 result = f(*a)
 if not hasattr(result, "__len__"):
 raise TypeError("Return type of the user-defined function 
should be "
 "Pandas.Series, but is 
{}".format(type(result)))
 if len(result) != len(a[0]):
 raise RuntimeError("Result vector from pandas_udf was not the 
required length: "
"expected %d, got %d" % (len(a[0]), 
len(result)))
+
+# Ensure return type of Pandas.Series matches the arrow return 
type of the user-defined
+# function. Otherwise, we may produce incorrect serialized data.
+# Note: for timestamp type, we only need to ensure both types are 
timestamp because the
+# serializer will do conversion.
+try:
+arrow_type_of_result = pa.from_numpy_dtype(result.dtype)
+both_are_timestamp = 
pa.types.is_timestamp(arrow_type_of_result) and \
+pa.types.is_timestamp(arrow_return_type)
+if not both_are_timestamp and arrow_return_type != 
arrow_type_of_result:
+print("WARN: Arrow type %s of return Pandas.Series of the 
user-defined function's "
--- End diff --

No, but as the other print usage in `worker.py`, I think this can be seen 
in the worker log?

This is also useful when testing in pyspark shell.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-02 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22610
  
Thanks for looking into this @viirya !  I agree that there are lots of 
cases where casting to another type is intentional and works fine, so this 
isn't a bug. The only other idea I have is to provide an option to raise an 
error if the type needs to be cast. That might be possible with pyarrow right 
now, but I'm not sure how useful it would be.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22606


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-02 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/22610#discussion_r222007837
  
--- Diff: python/pyspark/worker.py ---
@@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type):
 arrow_return_type = to_arrow_type(return_type)
 
 def verify_result_length(*a):
+import pyarrow as pa
 result = f(*a)
 if not hasattr(result, "__len__"):
 raise TypeError("Return type of the user-defined function 
should be "
 "Pandas.Series, but is 
{}".format(type(result)))
 if len(result) != len(a[0]):
 raise RuntimeError("Result vector from pandas_udf was not the 
required length: "
"expected %d, got %d" % (len(a[0]), 
len(result)))
+
+# Ensure return type of Pandas.Series matches the arrow return 
type of the user-defined
+# function. Otherwise, we may produce incorrect serialized data.
+# Note: for timestamp type, we only need to ensure both types are 
timestamp because the
+# serializer will do conversion.
+try:
+arrow_type_of_result = pa.from_numpy_dtype(result.dtype)
+both_are_timestamp = 
pa.types.is_timestamp(arrow_type_of_result) and \
+pa.types.is_timestamp(arrow_return_type)
+if not both_are_timestamp and arrow_return_type != 
arrow_type_of_result:
+print("WARN: Arrow type %s of return Pandas.Series of the 
user-defined function's "
--- End diff --

Will this appear when being run in an executor?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22606
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22599: [SPARK-25581][SQL] Rename method `benchmark` as `runBenc...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96855/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22599: [SPARK-25581][SQL] Rename method `benchmark` as `runBenc...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22599: [SPARK-25581][SQL] Rename method `benchmark` as `runBenc...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22599
  
**[Test build #96855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96855/testReport)**
 for PR 22599 at commit 
[`1c15c25`](https://github.com/apache/spark/commit/1c15c25430b5084381c71215e4e2ea1f72f0af7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22524
  
**[Test build #96860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96860/testReport)**
 for PR 22524 at commit 
[`1b2ab61`](https://github.com/apache/spark/commit/1b2ab6106f52558c48c8a82af6bda507b5189f64).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >