[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-31 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r214505595
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1357,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
--- End diff --

super nit: comment in 
https://github.com/apache/spark/pull/22205/files#diff-a636a87d8843eeccca90140be91d4fafR1348
 not change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22307
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22307
  
**[Test build #95573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95573/testReport)**
 for PR 22307 at commit 
[`60cc1c9`](https://github.com/apache/spark/commit/60cc1c9c66dade490dc0501622f8ac6b554b7ff4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2756/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22307: [SPARK-25301][SQL] When a view uses an UDF from a...

2018-08-31 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/22307

[SPARK-25301][SQL] When a view uses an UDF from a non default database, 
Spark analyser throws AnalysisException

## What changes were proposed in this pull request?
When a hive view uses an UDF from a non default database, Spark analyser 
throws AnalysisException

Steps to simulate this issue
-
In Hive

1) CREATE DATABASE d100;
2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
'com.uzx.udf.Masking'
3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
created in d100
4) create view d100.v100 as select d100.udf100(name)  from default.emp; 
// Note : table default.emp has two columns 'nanme', 'address', 
5) select * from d100.v100; // query on view d100.v100 gives correct result

In Spark
-
1) spark.sql("select * from d100.v100").show
throws 
```
org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database 'default'
```

This is because, while parsing the SQL statement of the View 
``` 'select `d100.udf100`(`emp`.`name`) from `default`.`emp`' ``` , spark 
parser fails to split database name and udf name and hence Spark function 
registry tries to load the UDF 'd100.udf100' from 'default' database.

To solve this issue, before creating 'FunctionIdentifier' ,  try to  get  
actual database name and then create FunctionIdentifier using  that database 
name and function name 

## How was this patch tested?
Added 1 unit test 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_fix_view_with_udf_issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22307


commit 60cc1c9c66dade490dc0501622f8ac6b554b7ff4
Author: Vinod KC 
Date:   2018-08-31T16:57:00Z

fix issue with non default udf in hive view




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-31 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214505265
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  /** Dropwizard metrics gauge measuring the executor's process CPU time.
+   *  This code will try to get JVM Process CPU time or return -1 
otherwise.
+   *  The CPU time value is returned in nanoseconds.
+   *  It will use proprietary extensions as 
com.sun.management.OperatingSystemMXBean or
+   *  com.ibm.lang.management.OperatingSystemMXBean if available
+   */
+  val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer
+  val name = new ObjectName("java.lang", "type", "OperatingSystem")
+  metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new 
Gauge[Long] {
--- End diff --

nit: `name("executorCPUTime" )` -> `name("executorCPUTime")`



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22302
  
**[Test build #95572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95572/testReport)**
 for PR 22302 at commit 
[`c3c5af2`](https://github.com/apache/spark/commit/c3c5af2b9f27e4effcc8f44f1b98f24a6d1fafa7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-08-31 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22302
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-08-31 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22302
  
@fjh100456 Can you format the PR description cleanly to make others more 
understood? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-08-31 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22302
  
@gatorsmile @ueshin Can you trigger this? I checked the related jira and 
code and, then I think these tests should be passed in master when 
`usingCTAS`=true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95567/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95567 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95567/testReport)**
 for PR 22192 at commit 
[`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95565/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95565/testReport)**
 for PR 22192 at commit 
[`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20637


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2755/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22306
  
**[Test build #95571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95571/testReport)**
 for PR 22306 at commit 
[`82525d7`](https://github.com/apache/spark/commit/82525d753b80aef856217b9a161966a7ad499eca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-31 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20637
  
Thanks! merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22306: [SPARK-25300][CORE]Unified the configuration para...

2018-08-31 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/22306

[SPARK-25300][CORE]Unified the configuration parameter 
`spark.shuffle.service.enabled`

## What changes were proposed in this pull request?

The configuration parameter "spark.shuffle.service.enabled"  has defined in 
`package.scala`,  and it  is also used in many place,  so we can replace it 
with `SHUFFLE_SERVICE_ENABLED`.
and unified  this configuration parameter "spark.shuffle.service.port"  
together.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark unifiedserviceenable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22306.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22306


commit 82525d753b80aef856217b9a161966a7ad499eca
Author: liuxian 
Date:   2018-09-01T03:09:08Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-31 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214502781
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -48,6 +48,8 @@ class RowBasedHashMapGenerator(
 val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
 val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)
 
+val numVarLenFields = groupingKeys.map(_.dataType).count(dt => 
!UnsafeRow.isFixedLength(dt))
--- End diff --

@cloud-fan We want to discuss, how to modify?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-31 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214502585
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -48,6 +48,8 @@ class RowBasedHashMapGenerator(
 val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
 val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)
 
+val numVarLenFields = groupingKeys.map(_.dataType).count(dt => 
!UnsafeRow.isFixedLength(dt))
--- End diff --

@cloud-fan We want to discuss, how to modify?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22204
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95566/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22204
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22204
  
**[Test build #95566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95566/testReport)**
 for PR 22204 at commit 
[`929bc2d`](https://github.com/apache/spark/commit/929bc2d9eeaeafebca54deb07c6a428a8b661380).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread ifilonenko
Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22298
  
@mccheah as such, that should be a followup PR and this should be good to 
merge as long as  @holdenk gives a LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95563/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214500639
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
--- End diff --

many thanks! ya, plz ping me to review ;)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #95563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95563/testReport)**
 for PR 7 at commit 
[`d80b1a1`](https://github.com/apache/spark/commit/d80b1a15ed8941bad78df2c5f7168a4196d27be4).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499281
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
--- End diff --

@maropu You are right !! I will try to optimize this in the other pr i am 
going to open. please check if you like it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499216
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
   df5.select(map_from_arrays($"k", $"v")).collect
--- End diff --

@maropu I will double check and get back. But i think it was SparkException.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499155
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
-  ${ev.isNull} = true;
+  ${setIsNullCode}
 } else if (${ctx.genEqual(right.dataType, value, getValue)}) {
-  ${ev.isNull} = false;
+  ${unsetIsNullCode}
--- End diff --

@maropu will change


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499145
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1730,9 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 
   test("SPARK-9083: sort with non-deterministic expressions") {
 import org.apache.spark.util.random.XORShiftRandom
--- End diff --

@maropu will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22270
  
@cloud-fan Thank you Wenchen. Do we want to fix the two codegen compile 
errors in 2.4 ? One is in ArrayContains and the other is in ArraySort.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2754/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2754/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-31 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22270
  
Thanks for looking into this!

I think we have to clean up our test framework later (after 2.4). We should 
identify the test cases that are actually testing the expressions, and run it 
with/without enabling the local relation optimization, to test both codegen and 
interpreted code paths.

Since the current test suites are a little messy, this will be a lot of 
work, to reorganize them. I'm looking forward to seeing us accomplish it in 
Spark 3.0!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2754/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22298
  
https://issues.apache.org/jira/browse/SPARK-25291 looks like a real issue 
with the way the tests are written. So we don't necessarily want to ignore it 
for this patch, but we're still thinking about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-31 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214497013
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,36 +229,74 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around matches of the given regex.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * regex - a string representing a regular expression. The regex 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the regex is applied.
+
+limit > 0: The resulting array's length will not be more than 
`limit`,
+   and the resulting array's last entry will contain all 
input
+   beyond the last matched regex.
+limit <= 0: `regex` will be applied as many times as possible, and
+the resulting array can be of any size.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
+  > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1);
+   ["one","two","three",""]
+  > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2);
+   ["one","twoBthreeC"]
   """)
-case class StringSplit(str: Expression, pattern: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
+case class StringSplit(str: Expression, regex: Expression, limit: 
Expression)
+  extends TernaryExpression with ImplicitCastInputTypes {
 
-  override def left: Expression = str
-  override def right: Expression = pattern
   override def dataType: DataType = ArrayType(StringType)
-  override def inputTypes: Seq[DataType] = Seq(StringType, StringType)
+  override def inputTypes: Seq[DataType] = Seq(StringType, StringType, 
IntegerType)
+  override def children: Seq[Expression] = str :: regex :: limit :: Nil
+
+  def this(exp: Expression, regex: Expression) = this(exp, regex, 
Literal(-1));
 
-  override def nullSafeEval(string: Any, regex: Any): Any = {
-val strings = 
string.asInstanceOf[UTF8String].split(regex.asInstanceOf[UTF8String], -1)
+  override def nullSafeEval(string: Any, regex: Any, limit: Any): Any = {
+val strings = string.asInstanceOf[UTF8String].split(
+  regex.asInstanceOf[UTF8String], 
maybeFallbackLimitValue(limit.asInstanceOf[Int]))
 new GenericArrayData(strings.asInstanceOf[Array[Any]])
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val arrayClass = classOf[GenericArrayData].getName
-nullSafeCodeGen(ctx, ev, (str, pattern) =>
+nullSafeCodeGen(ctx, ev, (str, regex, limit) => {
   // Array in java is covariant, so we don't need to cast UTF8String[] 
to Object[].
-  s"""${ev.value} = new $arrayClass($str.split($pattern, -1));""")
+  s"""${ev.value} = new $arrayClass($str.split(
+ $regex,${handleCodeGenLimitFallback(limit)}));""".stripMargin
+})
   }
 
   override def prettyName: String = "split"
+
+  /**
+   * Java String's split method supports "ignore empty string" behavior 
when the limit is 0.
+   * To avoid this, we fall back to -1 when the limit is 0. Otherwise, 
this is a noop.
+   */
+  def maybeFallbackLimitValue(limit: Int): Int = {
--- End diff --

+1, and please add `limit = 0` case in `UTF8StringSuite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
**[Test build #95570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95570/testReport)**
 for PR 22298 at commit 
[`3ad1324`](https://github.com/apache/spark/commit/3ad13242f0f5c5d34cda8ca5d0e0ad0b1bdbcead).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95570/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread ifilonenko
Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22298
  
@holdenk and @mccheah any other comments before merge? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
**[Test build #95570 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95570/testReport)**
 for PR 22298 at commit 
[`3ad1324`](https://github.com/apache/spark/commit/3ad13242f0f5c5d34cda8ca5d0e0ad0b1bdbcead).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-31 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214495391
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ---
@@ -51,6 +51,7 @@ private[spark] class BasicDriverFeatureStep(
 .get(DRIVER_MEMORY_OVERHEAD)
 .getOrElse(math.max((conf.get(MEMORY_OVERHEAD_FACTOR) * 
driverMemoryMiB).toInt,
   MEMORY_OVERHEAD_MIN_MIB))
+  // TODO: Have memory limit checks on driverMemory
--- End diff --

Valid point. These are not necessary


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread NiharS
Github user NiharS commented on the issue:

https://github.com/apache/spark/pull/22192
  
These tests are failing locally for me, both with my code and without on a 
clean pull of master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95559/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95559 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95559/testReport)**
 for PR 22192 at commit 
[`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21743: [SPARK-24767][Launcher] Propagate MDC to spark-submit th...

2018-08-31 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21743
  
also, I don't recall anywhere in spark that depends/sets MDC...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95560/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20637
  
**[Test build #95560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95560/testReport)**
 for PR 20637 at commit 
[`88c74c6`](https://github.com/apache/spark/commit/88c74c61d5ab64eb860b91261d013702983ac49c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22266: [SPARK-25270] lint-python: Add flake8 to find syn...

2018-08-31 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22266#discussion_r214494032
  
--- Diff: dev/lint-python ---
@@ -82,6 +82,25 @@ else
 rm "$PYCODESTYLE_REPORT_PATH"
 fi
 
+# stop the build if there are Python syntax errors or undefined names
+flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
--statistics
--- End diff --

Yea, sounds good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2753/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22215
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95569/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22215
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2753/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22215
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22215
  
**[Test build #95569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95569/testReport)**
 for PR 22215 at commit 
[`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95568/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22215
  
**[Test build #95568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95568/testReport)**
 for PR 22215 at commit 
[`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22215
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22215
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2753/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/22298
  
Looks fine to me, but I'm not familiar enough with the K8S code to have 
much of an opinion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-31 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214492297
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala
 ---
@@ -72,12 +72,33 @@ private[spark] trait PythonTestsSuite { k8sSuite: 
KubernetesSuite =>
   isJVM = false,
   pyFiles = Some(PYSPARK_CONTAINER_TESTS))
   }
+
+  test("Run PySpark with memory customization", k8sTestTag) {
+sparkAppConf
+  .set("spark.kubernetes.container.image", 
s"${getTestImageRepo}/spark-py:${getTestImageTag}")
--- End diff --

Some of this stuff can be factored out I think, we just haven't done so 
yet. I wouldn't block a merge of this on such a refactor, but this entire test 
class could probably use some cleanup with respect to how the code is 
structured.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214492145
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -48,6 +48,8 @@ class RowBasedHashMapGenerator(
 val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
 val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)
 
+val numVarLenFields = groupingKeys.map(_.dataType).count(dt => 
!UnsafeRow.isFixedLength(dt))
--- End diff --

plz keep the comment `// TODO: consider large decimal and interval type` 
below


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-31 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214492080
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/JavaDriverFeatureStep.scala
 ---
@@ -38,7 +39,8 @@ private[spark] class JavaDriverFeatureStep(
   .build()
 SparkPod(pod.pod, withDriverArgs)
   }
-  override def getAdditionalPodSystemProperties(): Map[String, String] = 
Map.empty
+  override def getAdditionalPodSystemProperties(): Map[String, String] =
--- End diff --

Again, not sure what our direction is going to be with respect to mixed 
pipelines throughout the cluster manager sections - if we should be supporting 
it in a first class way then perhaps we file a JIRA and we can discuss how the 
submission client should be refactored to support that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22215
  
**[Test build #95569 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95569/testReport)**
 for PR 22215 at commit 
[`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214491914
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -48,6 +48,8 @@ class RowBasedHashMapGenerator(
 val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
 val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)
 
+val numVarLenFields = groupingKeys.map(_.dataType).count(dt => 
!UnsafeRow.isFixedLength(dt))
--- End diff --

super nit: `.count(!UnsafeRow.isFixedLength(_))`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-31 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214491727
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ---
@@ -51,6 +51,7 @@ private[spark] class BasicDriverFeatureStep(
 .get(DRIVER_MEMORY_OVERHEAD)
 .getOrElse(math.max((conf.get(MEMORY_OVERHEAD_FACTOR) * 
driverMemoryMiB).toInt,
   MEMORY_OVERHEAD_MIN_MIB))
+  // TODO: Have memory limit checks on driverMemory
--- End diff --

Hm can you elaborate here? We already set the driver memory limit in this 
step based on the overhead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214491757
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

Yes. I checked that, @srowen . `org.apache.orc` only uses Kryo constructor, 
`writeObject`, and `readObject` from `kryo-shaded` library. There is no change 
for them.

**WRITE**
```
(new Kryo()).writeObject(out, sarg);
```
**READ**
```
... = (new Kryo()).readObject(new Input(sargBytes), 
SearchArgumentImpl.class);
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22215
  
**[Test build #95568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95568/testReport)**
 for PR 22215 at commit 
[`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...

2018-08-31 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22215#discussion_r214491272
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
 ---
@@ -151,13 +152,15 @@ private[spark] class ExecutorPodsLifecycleManager(
 
   private def exitReasonMessage(podState: FinalPodState, execId: Long, 
exitCode: Int) = {
 val pod = podState.pod
+val reason = Option(pod.getStatus.getReason)
+val message = Option(pod.getStatus.getMessage)
 s"""
|The executor with id $execId exited with exit code $exitCode.
-   |The API gave the following brief reason: ${pod.getStatus.getReason}
-   |The API gave the following message: ${pod.getStatus.getMessage}
+   |The API gave the following brief reason: ${reason.getOrElse("")}
--- End diff --

Maybe default as `N/A`? Might be confusing to be left blank.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...

2018-08-31 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22215#discussion_r214490999
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -60,4 +64,81 @@ private[spark] object KubernetesUtils {
   }
 
   def parseMasterUrl(url: String): String = url.substring("k8s://".length)
+
+  def formatPairsBundle(pairs: Seq[(String, String)], indent: Int = 1) : 
String = {
+// Use more loggable format if value is null or empty
+val indentStr = "\t" * indent
--- End diff --

Can we prefer space-based indentation? Curious as to whether others have an 
opinion about this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...

2018-08-31 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22215#discussion_r214491177
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -60,4 +64,81 @@ private[spark] object KubernetesUtils {
   }
 
   def parseMasterUrl(url: String): String = url.substring("k8s://".length)
+
+  def formatPairsBundle(pairs: Seq[(String, String)], indent: Int = 1) : 
String = {
+// Use more loggable format if value is null or empty
+val indentStr = "\t" * indent
+pairs.map {
+  case (k, v) => s"\n$indentStr $k: 
${Option(v).filter(_.nonEmpty).getOrElse("N/A")}"
+}.mkString("")
+  }
+
+  /**
+   * Given a pod, output a human readable representation of its state
+   *
+   * @param pod Pod
+   * @return Human readable pod state
+   */
+  def formatPodState(pod: Pod): String = {
+val details = Seq[(String, String)](
+  // pod metadata
+  ("pod name", pod.getMetadata.getName),
+  ("namespace", pod.getMetadata.getNamespace),
+  ("labels", pod.getMetadata.getLabels.asScala.mkString(", ")),
+  ("pod uid", pod.getMetadata.getUid),
+  ("creation time", formatTime(pod.getMetadata.getCreationTimestamp)),
+
+  // spec details
+  ("service account name", pod.getSpec.getServiceAccountName),
+  ("volumes", 
pod.getSpec.getVolumes.asScala.map(_.getName).mkString(", ")),
+  ("node name", pod.getSpec.getNodeName),
+
+  // status
+  ("start time", formatTime(pod.getStatus.getStartTime)),
+  ("phase", pod.getStatus.getPhase),
+  ("container status", containersDescription(pod, 2))
+)
+
+formatPairsBundle(details)
+  }
+
+  def containersDescription(p: Pod, indent: Int = 1): String = {
+p.getStatus.getContainerStatuses.asScala.map { status =>
+  Seq(
+("container name", status.getName),
+("container image", status.getImage)) ++
+containerStatusDescription(status)
+}.map(p => formatPairsBundle(p, indent)).mkString("\n\n")
+  }
+
+  def containerStatusDescription(containerStatus: ContainerStatus)
+: Seq[(String, String)] = {
+val state = containerStatus.getState
+Option(state.getRunning)
--- End diff --

This is a really cool use of partial functions - wonder if there's other 
places where we should be matching this way (of course doesn't have to be done 
here).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-08-31 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22215
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214491144
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,36 +229,74 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around matches of the given regex.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * regex - a string representing a regular expression. The regex 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the regex is applied.
+
+limit > 0: The resulting array's length will not be more than 
`limit`,
+   and the resulting array's last entry will contain all 
input
+   beyond the last matched regex.
+limit <= 0: `regex` will be applied as many times as possible, and
+the resulting array can be of any size.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
+  > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1);
+   ["one","two","three",""]
+  > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2);
+   ["one","twoBthreeC"]
   """)
-case class StringSplit(str: Expression, pattern: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
+case class StringSplit(str: Expression, regex: Expression, limit: 
Expression)
+  extends TernaryExpression with ImplicitCastInputTypes {
 
-  override def left: Expression = str
-  override def right: Expression = pattern
   override def dataType: DataType = ArrayType(StringType)
-  override def inputTypes: Seq[DataType] = Seq(StringType, StringType)
+  override def inputTypes: Seq[DataType] = Seq(StringType, StringType, 
IntegerType)
+  override def children: Seq[Expression] = str :: regex :: limit :: Nil
+
+  def this(exp: Expression, regex: Expression) = this(exp, regex, 
Literal(-1));
 
-  override def nullSafeEval(string: Any, regex: Any): Any = {
-val strings = 
string.asInstanceOf[UTF8String].split(regex.asInstanceOf[UTF8String], -1)
+  override def nullSafeEval(string: Any, regex: Any, limit: Any): Any = {
+val strings = string.asInstanceOf[UTF8String].split(
+  regex.asInstanceOf[UTF8String], 
maybeFallbackLimitValue(limit.asInstanceOf[Int]))
 new GenericArrayData(strings.asInstanceOf[Array[Any]])
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val arrayClass = classOf[GenericArrayData].getName
-nullSafeCodeGen(ctx, ev, (str, pattern) =>
+nullSafeCodeGen(ctx, ev, (str, regex, limit) => {
   // Array in java is covariant, so we don't need to cast UTF8String[] 
to Object[].
-  s"""${ev.value} = new $arrayClass($str.split($pattern, -1));""")
+  s"""${ev.value} = new $arrayClass($str.split(
+ $regex,${handleCodeGenLimitFallback(limit)}));""".stripMargin
+})
   }
 
   override def prettyName: String = "split"
+
+  /**
+   * Java String's split method supports "ignore empty string" behavior 
when the limit is 0.
+   * To avoid this, we fall back to -1 when the limit is 0. Otherwise, 
this is a noop.
+   */
+  def maybeFallbackLimitValue(limit: Int): Int = {
--- End diff --

To make all the split behaviour consistent in Spark, how about trying to 
fix code in `UTF8String.split` directly? (We need to check if the change does 
not break the existing behaviour, too) cc: @ueshin @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22257: [SPARK-25264][K8S] Fix comma-delineated arguments...

2018-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22257


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

2018-08-31 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22048#discussion_r214490659
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging {
   }
 }
   }
+
+  /**
+   * Regular expression matching full width characters
+   */
+  private val fullWidthRegex = ("""[""" +
+// scalastyle:off nonascii
+"""\u1100-\u115F""" +
+"""\u2E80-\uA4CF""" +
+"""\uAC00-\uD7A3""" +
+"""\uF900-\uFAFF""" +
+"""\uFE10-\uFE19""" +
+"""\uFE30-\uFE6F""" +
+"""\uFF00-\uFF60""" +
+"""\uFFE0-\uFFE6""" +
--- End diff --

A general question. 
- How to get this Regex list? Any reference?  It sounds like this should be 
a general problem
- What is the performance impact?

Can you answer them and post them in the PR description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22257: [SPARK-25264][K8S] Fix comma-delineated arguments passed...

2018-08-31 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22257
  
Looks good, thanks. Merging.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214490061
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  /** Dropwizard metrics gauge measuring the executor's process CPU time.
+   *  This code will try to get JVM Process CPU time or return -1 
otherwise.
+   *  The CPU time value is returned in nanoseconds.
+   *  It will use proprietary extensions as 
com.sun.management.OperatingSystemMXBean or
+   *  com.ibm.lang.management.OperatingSystemMXBean if available
+   */
+  val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer
+  val name = new ObjectName("java.lang", "type", "OperatingSystem")
+  metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new 
Gauge[Long] {
+override def getValue: Long = {
+  try {
+val attribute = mBean.getAttribute(name, "ProcessCpuTime")
+if (attribute != null) {
+  attribute.asInstanceOf[Long]
+} else {
+  -1L
--- End diff --

Any reason to return -1 instead of 0?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214489886
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  /** Dropwizard metrics gauge measuring the executor's process CPU time.
+   *  This code will try to get JVM Process CPU time or return -1 
otherwise.
+   *  The CPU time value is returned in nanoseconds.
+   *  It will use proprietary extensions as 
com.sun.management.OperatingSystemMXBean or
+   *  com.ibm.lang.management.OperatingSystemMXBean if available
+   */
+  val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer
+  val name = new ObjectName("java.lang", "type", "OperatingSystem")
+  metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new 
Gauge[Long] {
--- End diff --

a little confused with the exsiting `cpuTime`. How about `jvmCpuTime`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214489390
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  /** Dropwizard metrics gauge measuring the executor's process CPU time.
+   *  This code will try to get JVM Process CPU time or return -1 
otherwise.
+   *  The CPU time value is returned in nanoseconds.
+   *  It will use proprietary extensions as 
com.sun.management.OperatingSystemMXBean or
+   *  com.ibm.lang.management.OperatingSystemMXBean if available
+   */
+  val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer
+  val name = new ObjectName("java.lang", "type", "OperatingSystem")
+  metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new 
Gauge[Long] {
+override def getValue: Long = {
+  try {
+val attribute = mBean.getAttribute(name, "ProcessCpuTime")
+if (attribute != null) {
+  attribute.asInstanceOf[Long]
+} else {
+  -1L
+}
+  } catch {
+case _ : Exception => -1L
--- End diff --

`case NonFatal(_) => -1`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95567/testReport)**
 for PR 22192 at commit 
[`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22204
  
**[Test build #95566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95566/testReport)**
 for PR 22204 at commit 
[`929bc2d`](https://github.com/apache/spark/commit/929bc2d9eeaeafebca54deb07c6a428a8b661380).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22204
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...

2018-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22204
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2752/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22304: [SPARK-25297][Streaming][Test] Fix blocking unit tests f...

2018-08-31 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22304
  
The one concern is whether this behavior change impacts anything else. Now 
some exceptions will cause an actual exception in a caller to methods like 
Await, rather than just logging and continuing. That might even be a good 
thing, and is crucial here. But that's what I have in mind. That said, tests 
all pass.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22192
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r214486314
  
--- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java ---
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.apache.spark.api.java.JavaSparkContext;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+// Tests loading plugins into executors
+public class ExecutorPluginSuite {
+  // Static value modified by testing plugin to ensure plugin loaded 
correctly.
+  public static int numSuccessfulPlugins = 0;
+  // Static value modified by testing plugin to verify plugins shut down 
properly.
+  public static int numSuccessfulTerminations = 0;
+  private JavaSparkContext sc;
+
+  private String EXECUTOR_PLUGIN_CONF_NAME = "spark.executor.plugins";
+  private String testPluginName = TestExecutorPlugin.class.getName();
--- End diff --

final static


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r214486250
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -604,4 +604,14 @@ package object config {
   .intConf
   .checkValue(v => v > 0, "The max failures should be a positive 
value.")
   .createWithDefault(40)
+
+  private[spark] val EXECUTOR_PLUGINS =
+ConfigBuilder("spark.executor.plugins")
+  .doc("Comma-separated list of class names for \"plugins\" 
implementing " +
+"org.apache.spark.ExecutorPlugin.  Plugins have the same 
privileges as any task " +
+"in a spark executor.  They can also interfere with task execution 
and fail in " +
--- End diff --

Spark


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r214486385
  
--- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java ---
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.apache.spark.api.java.JavaSparkContext;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+// Tests loading plugins into executors
+public class ExecutorPluginSuite {
+  // Static value modified by testing plugin to ensure plugin loaded 
correctly.
+  public static int numSuccessfulPlugins = 0;
+  // Static value modified by testing plugin to verify plugins shut down 
properly.
+  public static int numSuccessfulTerminations = 0;
+  private JavaSparkContext sc;
--- End diff --

Move after the variables below (which should be "final static").


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95565/testReport)**
 for PR 22192 at commit 
[`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-31 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r214486075
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -136,6 +136,15 @@ private[spark] class Executor(
   // for fetching remote cached RDD blocks, so need to make sure it uses 
the right classloader too.
   env.serializerManager.setDefaultClassLoader(replClassLoader)
 
+  private val pluginList = conf.get(EXECUTOR_PLUGINS)
+  if (pluginList != Nil) logDebug(s"Loading the following plugins: 
${pluginList.mkString(", ")}")
+  // Load executor plugins
+  Thread.currentThread().setContextClassLoader(replClassLoader)
+  private val executorPlugins =
+Utils.loadExtensions(classOf[ExecutorPlugin], pluginList, conf)
+  executorPlugins.foreach(_.init())
+  if (pluginList != Nil) logDebug("Finished loading plugins")
--- End diff --

Same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >