date:20161004

[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15292
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15292
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66365/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15292
  
**[Test build #66365 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66365/consoleFull)**
 for PR 15292 at commit 
[`3fa9f43`](https://github.com/apache/spark/commit/3fa9f43686f1195a9f86ab1bcda054119c332a20).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15231: [SPARK-17658][SPARKR] read.df/write.df API taking path o...

2016-10-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15231
  
I think also it would be great to add to examples one for read.df and one 
for write.df without the path parameter (like a jdbc one)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15359
  
**[Test build #66377 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66377/consoleFull)**
 for PR 15359 at commit 
[`0d9a9c7`](https://github.com/apache/spark/commit/0d9a9c74ca714b1df3dde50f2c0386a4a974fa73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15231: [SPARK-17658][SPARKR] read.df/write.df API taking...

2016-10-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15231#discussion_r81904325
  
--- Diff: R/pkg/inst/tests/testthat/test_utils.R ---
@@ -167,10 +167,13 @@ test_that("convertToJSaveMode", {
 })
 
 test_that("captureJVMException", {
-  expect_error(tryCatch(callJStatic("org.apache.spark.sql.api.r.SQLUtils", 
"getSQLDataType",
+  method <- "getSQLDataType"
+  expect_error(tryCatch(callJStatic("org.apache.spark.sql.api.r.SQLUtils", 
method,
--- End diff --

let's change this test to `handledCallJStatic` too in a follow up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.

2016-10-04 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15359
  
cc @zhengruifeng @sethah 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15359: [Minor][ML] Avoid 2D array flatten in NB training...

2016-10-04 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/15359

[Minor][ML] Avoid 2D array flatten in NB training.

## What changes were proposed in this pull request?
Avoid 2D array flatten in ```NaiveBayes``` training, since flatten method 
might be expensive (It will create another array and copy data there).

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark nb-theta

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15359.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15359


commit 0d9a9c74ca714b1df3dde50f2c0386a4a974fa73
Author: Yanbo Liang 
Date:   2016-10-05T05:39:13Z

Avoid 2D array flatten in NB training.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15258
  
**[Test build #66374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66374/consoleFull)**
 for PR 15258 at commit 
[`01cb666`](https://github.com/apache/spark/commit/01cb6664ea9ea2da7bc861432c19e3ac14ede524).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluster based...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15262
  
**[Test build #66373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66373/consoleFull)**
 for PR 15262 at commit 
[`3a1cd22`](https://github.com/apache/spark/commit/3a1cd221402f4ade6b496996b81665ad19ce3e86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15355
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15355
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66361/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14151
  
**[Test build #66375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66375/consoleFull)**
 for PR 14151 at commit 
[`e263b15`](https://github.com/apache/spark/commit/e263b1508a77424b371a0796ea4f9c05bc1c0121).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14087
  
**[Test build #66376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66376/consoleFull)**
 for PR 14087 at commit 
[`ecdf653`](https://github.com/apache/spark/commit/ecdf6539c8c19da3f019601309993fde634d6c22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15355
  
**[Test build #66361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66361/consoleFull)**
 for PR 15355 at commit 
[`b7074d4`](https://github.com/apache/spark/commit/b7074d48159804035eaf00e1abed35e408684b42).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #66372 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66372/consoleFull)**
 for PR 15354 at commit 
[`5f185e3`](https://github.com/apache/spark/commit/5f185e36aba86865e2cae772351e90fb8bec6492).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12355: [SPARK-14344][SQL] Not creating meta files when summary-...

2016-10-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/12355
  
It seems working fine now. Therefore, it seems not a problem.

```scala
test("SPARK-14344 - write metadata")
  withSQLConf(ParquetOutputFormat.ENABLE_JOB_SUMMARY -> "true") {
withTempPath { dir =>
  val path = s"${dir.getCanonicalPath}/part-r-0.parquet"
  spark.range(10).write.parquet(path)
  val compressedFiles = new File(path).listFiles()
  assert(compressedFiles.exists(_.getName.endsWith("_common_metadata")))
}
  }

  withSQLConf(ParquetOutputFormat.ENABLE_JOB_SUMMARY -> "false") {
withTempPath { dir =>
  val path = s"${dir.getCanonicalPath}/part-r-0.parquet"
  spark.range(10).write.parquet(path)
  val compressedFiles = new File(path).listFiles()
  
assert(!compressedFiles.exists(_.getName.endsWith("_common_metadata")))
}
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66358/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15249
  
**[Test build #66358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66358/consoleFull)**
 for PR 15249 at commit 
[`89d3c5e`](https://github.com/apache/spark/commit/89d3c5eb44939c38b0be14a6fc10c2139d0126ab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #66360 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66360/consoleFull)**
 for PR 14452 at commit 
[`cebfbf5`](https://github.com/apache/spark/commit/cebfbf5e3dd7b2d2365e5152991ab7ff2c63dd90).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66362/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...

2016-10-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15357#discussion_r81901737
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -265,7 +265,9 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 }
 
 val statement = plan(ctx.statement)
-if (isExplainableStatement(statement)) {
+if (statement == null) {
+  null  // This is enough since ParseException will raise later.
--- End diff --

I added the testcase to intercept that. If it returns `null`, 
`ParseDriver.scala` recognizes it as a parsing error and raises `Unsupported 
SQL statement`.
```
  override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { 
parser =>
astBuilder.visitSingleStatement(parser.singleStatement()) match {
  case plan: LogicalPlan => plan
  case _ =>
val position = Origin(None, None)
throw new ParseException(Option(sqlText), "Unsupported SQL 
statement", position, position)
}
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66362 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66362/consoleFull)**
 for PR 15307 at commit 
[`f5732a5`](https://github.com/apache/spark/commit/f5732a50da7f0df326f52ad9b85da3876ecfafbc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12693: [SPARK-14914] Fix Resource not closed after using, mostl...

2016-10-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/12693
  
@srowen I am willing to proceed this too if you approve and @taoli91 is not 
echoing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15357
  
**[Test build #66371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66371/consoleFull)**
 for PR 15357 at commit 
[`45e46a9`](https://github.com/apache/spark/commit/45e46a969919c3fb184a3678764fa094054d223a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12135
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12135
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66359/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #66359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66359/consoleFull)**
 for PR 12135 at commit 
[`89ed0cc`](https://github.com/apache/spark/commit/89ed0ccfe22e345655f33fb77b670c4d2309ecd7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12135
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12135
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66363/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #66363 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66363/consoleFull)**
 for PR 12135 at commit 
[`a475090`](https://github.com/apache/spark/commit/a475090f5424752a1cfe04983d964f6fb85181b0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...

2016-10-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15357#discussion_r81900796
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -265,7 +265,9 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 }
 
 val statement = plan(ctx.statement)
-if (isExplainableStatement(statement)) {
+if (statement == null) {
+  null  // This is enough since ParseException will raise later.
--- End diff --

Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...

2016-10-04 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15357#discussion_r81900460
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -265,7 +265,9 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 }
 
 val statement = plan(ctx.statement)
-if (isExplainableStatement(statement)) {
+if (statement == null) {
+  null  // This is enough since ParseException will raise later.
--- End diff --

Where will the parseException be raised? Could you be a bit more specific 
in your comment?

Maybe add a small test for this? Could you also add a few unit tests (not 
end-2-end) to SparkSqlParserSuite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15358
  
**[Test build #66370 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66370/consoleFull)**
 for PR 15358 at commit 
[`d3cc470`](https://github.com/apache/spark/commit/d3cc47025df10012940f281af5db94c90fc83917).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15358: Hide Credentials in CREATE and DESC FORMATTED/EXT...

2016-10-04 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/15358

Hide Credentials in CREATE and DESC FORMATTED/EXTENDED a PERSISTENT/TEMP 
Table for JDBC

### What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


### How was this patch tested?
Added test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark maskCredentials

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15358


commit d3cc47025df10012940f281af5db94c90fc83917
Author: gatorsmile 
Date:   2016-10-05T04:37:35Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66356/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #66356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66356/consoleFull)**
 for PR 15354 at commit 
[`eec0cd3`](https://github.com/apache/spark/commit/eec0cd32bde8564a080da425be48986055523e8c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66364/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #66364 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66364/consoleFull)**
 for PR 15314 at commit 
[`423fd51`](https://github.com/apache/spark/commit/423fd5117e32e971e47a02728d6a863a726fc539).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-04 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r81899064
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -525,8 +645,62 @@ class StreamExecution(
   case object TERMINATED extends State
 }
 
-object StreamExecution {
+object StreamExecution extends Logging {
   private val _nextId = new AtomicLong(0)
 
+  /**
+   * Get the number of input rows from the executed plan of the trigger
+   * @param triggerExecutionPlan Execution plan of the trigger
+   * @param triggerLogicalPlan Logical plan of the trigger, generated from 
the query logical plan
+   * @param sourceToDataframe Source to DataFrame returned by the 
source.getBatch in this trigger
+   */
+  def getNumInputRowsFromTrigger(
--- End diff --

I managed to improve the test code, so remove this static method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66369 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66369/consoleFull)**
 for PR 15307 at commit 
[`e708b3b`](https://github.com/apache/spark/commit/e708b3b86a69833169962713ce8bef88bcbdc2f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15357
  
**[Test build #66368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66368/consoleFull)**
 for PR 15357 at commit 
[`4b60195`](https://github.com/apache/spark/commit/4b601951e4b3311501363ac4de864c4bf9a1a756).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-04 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r81899011
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -525,8 +645,62 @@ class StreamExecution(
   case object TERMINATED extends State
 }
 
-object StreamExecution {
+object StreamExecution extends Logging {
   private val _nextId = new AtomicLong(0)
 
+  /**
+   * Get the number of input rows from the executed plan of the trigger
+   * @param triggerExecutionPlan Execution plan of the trigger
+   * @param triggerLogicalPlan Logical plan of the trigger, generated from 
the query logical plan
+   * @param sourceToDataframe Source to DataFrame returned by the 
source.getBatch in this trigger
+   */
+  def getNumInputRowsFromTrigger(
+  triggerExecutionPlan: SparkPlan,
+  triggerLogicalPlan: LogicalPlan,
+  sourceToDataframe: Map[Source, DataFrame]): Map[Source, Long] = {
+
+// We want to associate execution plan leaves to sources that generate 
them, so that we match
+// the their metrics (e.g. numOutputRows) to the sources. To do this 
we do the following.
+// Consider the translation from the streaming logical plan to the 
final executed plan.
+//
+//  streaming logical plan (with sources) <==> trigger's logical plan 
<==> executed plan
+//
+// 1. We keep track of streaming sources associated with each leaf in 
the trigger's logical plan
+//- Each logical plan leaf will be associated with a single 
streaming source.
+//- There can be multiple logical plan leaves associated a 
streaming source.
+//- There can be leaves not associated with any streaming source, 
because they were
+//  generated from a batch source (e.g. stream-batch joins)
+//
+// 2. Assuming that the executed plan has same number of leaves in the 
same order as that of
+//the trigger logical plan, we associate executed plan leaves with 
corresponding
+//streaming sources.
+//
+// 3. For each source, we sum the metrics of the associated execution 
plan leaves.
+//
+val logicalPlanLeafToSource = sourceToDataframe.flatMap { case 
(source, df) =>
+  df.logicalPlan.collectLeaves().map { leaf => leaf -> source }
+}
+val allLogicalPlanLeaves = triggerLogicalPlan.collectLeaves() // 
includes non-streaming sources
+val allExecPlanLeaves = triggerExecutionPlan.collectLeaves()
+if (allLogicalPlanLeaves.size == allExecPlanLeaves.size) {
+  val execLeafToSource = 
allLogicalPlanLeaves.zip(allExecPlanLeaves).flatMap {
+case (lp, ep) => logicalPlanLeafToSource.get(lp).map { source => 
ep -> source }
+  }
+  val sourceToNumInputRows = execLeafToSource.map { case (execLeaf, 
source) =>
+val numRows = 
execLeaf.metrics.get("numOutputRows").map(_.value).getOrElse(0L)
+source -> numRows
+  }
+  sourceToNumInputRows.groupBy(_._1).mapValues(_.map(_._2).sum) // sum 
up rows for each source
+} else {
+  def toString[T](seq: Seq[T]): String = s"(size = ${seq.size}), 
${seq.mkString(", ")}"
+  logWarning(
+"Could not report metrics as number leaves in trigger logical plan 
did not match that" +
--- End diff --

I added 
[`logPeriodicWarning`](https://github.com/apache/spark/pull/15307/commits/e708b3b86a69833169962713ce8bef88bcbdc2f7)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15044
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66357/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...

2016-10-04 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/15357

[SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE

## What changes were proposed in this pull request?

This PR fixes the following NPE scenario.

**Reported Error Scenario**
```
scala> sql("EXPLAIN DESCRIBE TABLE x").show(truncate = false)
INFO SparkSqlParser: Parsing command: EXPLAIN DESCRIBE TABLE x
java.lang.NullPointerException
```


## How was this patch tested?

Pass the Jenkins test with a new test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-17328

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15357.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15357


commit 4b601951e4b3311501363ac4de864c4bf9a1a756
Author: Dongjoon Hyun 
Date:   2016-10-05T04:24:27Z

[SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15044
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15044
  
**[Test build #66357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66357/consoleFull)**
 for PR 15044 at commit 
[`2b22d12`](https://github.com/apache/spark/commit/2b22d128ef4c51643cd4dcdbe17a1f3d28362a90).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15249
  
**[Test build #66367 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66367/consoleFull)**
 for PR 15249 at commit 
[`a6c863f`](https://github.com/apache/spark/commit/a6c863f2462986b66a93f0beac3bb1f163afa50d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...

2016-10-04 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/15044
  
Thanks, I should update one file and update it later today in Japan.

PR #13758 can also solve this issue without allocating UnsafeArrayData. I 
think that PR #13758 is a generic solution and has small amount of changes. 
Which PR is preferable, #15044 or #13758?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-04 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/15249
  
@kayousterhout @mridulm thanks for the feedback.  obviously still need to 
figure out the timeout thing but otherwise think I've addressed things.  will 
do another pass in the morning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15249#discussion_r81898588
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+private[scheduler] object BlacklistTracker extends Logging {
+
+  private val DEFAULT_TIMEOUT = "1h"
--- End diff --

(longer top-level comment responding to this)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-04 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/15249
  
@mridulm on the questions about expiry from blacklists, you are not missing 
anything -- this explictly does not do any timeouts at the taskset level (this 
is mentioned in the design doc).  The timeout code you see is mostly just 
incremental stuff as a step towards https://github.com/apache/spark/pull/14079, 
but doesn't actually add any value here.

The primary motivation for blacklisting that I've seen is actually quite 
different from the use case you are describing -- its not to help deal w/ 
resource contention, but to deal w/ truly broken resources (a bad disk in all 
the cases I can think of).  In fact, in these cases, 1 hour is really short -- 
users really want something more like 6-12 hours probably.  But 1 hr really 
isn't so bad, it just means that the bad resources need to be "rediscovered" 
that often, with a scheduling hiccup while that happens.

This is really different from the use case you are describing -- its a form 
of back off to deal w/ resource contention.  I have actually talked to a couple 
of different folks about doing something like this recently and think it would 
be great, though I see problems with this approach, since it allows other tasks 
to still be scheduled on those executors, and also the time isn't relative to 
the task runtime etc.

Nonetheless, an issue here might be that the old option serves some purpose 
which is no longer supported.  Do we need to add it back in?  Just adding the 
logic for the timeouts again is pretty easy, though
(a) I need to figure out the right place to do it so that it doesn't impact 
scheduling performance

and more importantly

(b) I really worry about being able to configure things so that 
blacklisting can actually handle totally broken resources.  Eg., say that you 
set the timeout to 10s.  If your tasks take 1 minute each, then your one bad 
executor might cycle through the leftover tasks, fail them all, pass the 
timeout, and repeat that cycle a few times till you go over 
spark.task.maxFailures.  I don't see a good way to deal w/ while setting a 
sensible a timeout for the entire application.

Two other workarounds:

(2) just enable the timeout per-task when the legacy configuration is used. 
 Leave it undocumented.  We don't change behavior then, but configuration is 
kind of a mess, and it'll be a headache to continue to maintain this

(3) Add a timeout just to *taskset* level blacklisting.  So its a behavior 
change from the existing blacklisting, which has a timeout per *task*.  This 
removes the interaction w/ spark.task.maxFailures that we've always got to 
tiptoe around.  I also think it might satisfy your use case even better.  I 
still don't think its a great solution to the problem, and we need something 
else for handling this sort of backoff better, so I don't feel great about it 
getting shoved into this feature.

I'm thinking (3) is the best but will give it a bit more thought.  Also 
@kayousterhout @tgravescs @markhamstra for opinions as well since this is a 
bigger design point to consider.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66352/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66352/consoleFull)**
 for PR 15307 at commit 
[`02603c7`](https://github.com/apache/spark/commit/02603c7f56c8722d9003d09e40889084122ba40d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15356: [BUILD] Closing some stale PRs

2016-10-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15356
  
FYI, for 15294, the JIRA is set as `Won't fix`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15356: [BUILD] Closing some stale PRs

2016-10-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15356
  
Just to defense myself, it seems the PR such 15339 against a branch to a 
branch leaving a failure mark for each commit in the branches 
([branch-2.0](https://github.com/apache/spark/commits/branch-2.0)). Could you 
please take a look please @srowen ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15351
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66354/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15356: [BUILD] Closing some stale PRs

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15356
  
**[Test build #66366 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66366/consoleFull)**
 for PR 15356 at commit 
[`a307b5e`](https://github.com/apache/spark/commit/a307b5e40d59e5ce40a0c3986a6db1553acea50a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15351
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15351
  
**[Test build #66354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66354/consoleFull)**
 for PR 15351 at commit 
[`bb6d6c1`](https://github.com/apache/spark/commit/bb6d6c1d689d096e9c7ec123b74ae364978d8d1c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15356: [BUILD] Closing some stale PRs

2016-10-04 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/15356

[BUILD] Closing some stale PRs

## What changes were proposed in this pull request?

This PR proposes to close some stale PRs and ones suggested to be closed by 
committer(s) or obviously inappropriate PRs.

Closes #13458
Closes #14565
Closes #15078
Closes #15278
Closes #15294
Closes #15339

## How was this patch tested?

N/A



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark closing-prs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15356.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15356


commit a307b5e40d59e5ce40a0c3986a6db1553acea50a
Author: hyukjinkwon 
Date:   2016-10-05T04:00:39Z

Closing some stale PRs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66353/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66353 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66353/consoleFull)**
 for PR 15307 at commit 
[`9fd6815`](https://github.com/apache/spark/commit/9fd681536bf8200af4b448f87e8cdbf17df2c0ba).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15292
  
**[Test build #66365 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66365/consoleFull)**
 for PR 15292 at commit 
[`3fa9f43`](https://github.com/apache/spark/commit/3fa9f43686f1195a9f86ab1bcda054119c332a20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15348: [SPARK-17758][SQL] Last returns wrong result in case of ...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15348
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15348: [SPARK-17758][SQL] Last returns wrong result in case of ...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15348
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66349/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15348: [SPARK-17758][SQL] Last returns wrong result in case of ...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15348
  
**[Test build #66349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66349/consoleFull)**
 for PR 15348 at commit 
[`5ae49ae`](https://github.com/apache/spark/commit/5ae49ae2d0f98a79712abd5ccad262ea9e0f9b5e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15348: [SPARK-17758][SQL] Last returns wrong result in case of ...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15348
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15348: [SPARK-17758][SQL] Last returns wrong result in case of ...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15348
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66351/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15348: [SPARK-17758][SQL] Last returns wrong result in case of ...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15348
  
**[Test build #66351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66351/consoleFull)**
 for PR 15348 at commit 
[`8b442de`](https://github.com/apache/spark/commit/8b442debd33f6e985aa4ca536e2e8607db3ba477).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15249#discussion_r81896656
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+private[scheduler] object BlacklistTracker extends Logging {
+
+  private val DEFAULT_TIMEOUT = "1h"
+
+  /**
+   * Returns true if the blacklist is enabled, based on checking the 
configuration in the following
+   * order:
+   * 1. Is it specifically enabled or disabled?
+   * 2. Is it enabled via the legacy timeout conf?
+   * 3. Use the default for the spark-master:
+   *   - off for local mode
+   *   - on for distributed modes (including local-cluster)
+   */
+  def isBlacklistEnabled(conf: SparkConf): Boolean = {
+conf.get(config.BLACKLIST_ENABLED) match {
+  case Some(isEnabled) =>
+isEnabled
+  case None =>
+// if they've got a non-zero setting for the legacy conf, always 
enable the blacklist,
+// otherwise, use the default based on the cluster-mode (off for 
local-mode, on otherwise).
+val legacyKey = config.BLACKLIST_LEGACY_TIMEOUT_CONF.key
+conf.get(config.BLACKLIST_LEGACY_TIMEOUT_CONF) match {
+  case Some(legacyTimeout) =>
+if (legacyTimeout == 0) {
+  logWarning(s"Turning off blacklisting due to legacy 
configuaration:" +
+s" $legacyKey == 0")
+  false
+} else {
+  // mostly this is necessary just for tests, since real users 
that want the blacklist
+  // will get it anyway by default
+  logWarning(s"Turning on blacklisting due to legacy 
configuration:" +
+s" $legacyKey > 0")
+  true
+}
+  case None =>
+// local-cluster is *not* considered local for these purposes, 
we still want the
+// blacklist enabled by default
+!Utils.isLocalMaster(conf)
+}
+}
+  }
+
+  def getBlacklistTimeout(conf: SparkConf): Long = {
+conf.get(config.BLACKLIST_TIMEOUT_CONF).getOrElse {
+  conf.get(config.BLACKLIST_LEGACY_TIMEOUT_CONF).getOrElse {
+Utils.timeStringAsMs(DEFAULT_TIMEOUT)
+  }
+}
+  }
+
+  /**
+   * Verify that blacklist configurations are consistent; if not, throw an 
exception.  Should only
+   * be called if blacklisting is enabled.
+   *
+   * The configuration for the blacklist is expected to adhere to a few 
invariants.  Default
+   * values follow these rules of course, but users may unwittingly change 
one configuration
+   * without making the corresponding adjustment elsewhere.  This ensures 
we fail-fast when
+   * there are such misconfigurations.
+   */
+  def validateBlacklistConfs(conf: SparkConf): Unit = {
+
+def mustBePos(k: String, v: String): Unit = {
+  throw new IllegalArgumentException(s"$k was $v, but must be > 0.")
+}
+
+// undocumented escape hatch for validation -- just for tests that 
want to run in an "unsafe"
+// configuration.
+if (!conf.get("spark.blacklist.testing.skipValidation", 
"false").toBoolean) {
+
+  Seq(
+config.MAX_TASK_ATTEMPTS_PER_EXECUTOR,
+config.MAX_TASK_ATTEMPTS_PER_NODE,
+config.MAX_FAILURES_PER_EXEC_STAGE,
+config.MAX_FAILED_EXEC_PER_NODE_STAGE
+  ).foreach { config =>
+val v = conf.get(config)
+if (v <= 0) {
+  mustBePos(config.key, v.toString)
+}
+  }
+
+  val timeout = getBlacklistTimeout(conf)
+  if (timeout <= 0)

[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-04 Thread weiqingy

Github user weiqingy commented on the issue:

https://github.com/apache/spark/pull/15246
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #66364 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66364/consoleFull)**
 for PR 15314 at commit 
[`423fd51`](https://github.com/apache/spark/commit/423fd5117e32e971e47a02728d6a863a726fc539).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-04 Thread weiqingy

Github user weiqingy commented on the issue:

https://github.com/apache/spark/pull/15246
  
The changes should be safe to 
`org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.pattern based 
subscription`, I'll re-trigger again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66350/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66350 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66350/consoleFull)**
 for PR 15307 at commit 
[`bbd0d8b`](https://github.com/apache/spark/commit/bbd0d8bacae529cdb5e43b5165e3c687c5c9ec05).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14828: [SPARK-17258][SQL] Parse scientific decimal literals as ...

2016-10-04 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14828
  
@gatorsmile does this LGTY?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-04 Thread koeninger

Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15355
  
I have generally been unable to reproduce these kinds of test failures on 
my local environment, and don't have access to the build server, so trying fix 
without repro is pretty much shooting randomly in the dark.  It does seem 
unfortunate to me that we're effectively doing full integration tests on every 
PR, even if a patch has changed something (e.g. MLLib) that couldn't possibly 
affect the modules in /external


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66348/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66348 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66348/consoleFull)**
 for PR 15307 at commit 
[`05f22d7`](https://github.com/apache/spark/commit/05f22d7974f410289028bfa4df1d2f6036f5023e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-04 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r81895371
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -525,8 +645,62 @@ class StreamExecution(
   case object TERMINATED extends State
 }
 
-object StreamExecution {
+object StreamExecution extends Logging {
   private val _nextId = new AtomicLong(0)
 
+  /**
+   * Get the number of input rows from the executed plan of the trigger
+   * @param triggerExecutionPlan Execution plan of the trigger
+   * @param triggerLogicalPlan Logical plan of the trigger, generated from 
the query logical plan
+   * @param sourceToDataframe Source to DataFrame returned by the 
source.getBatch in this trigger
+   */
+  def getNumInputRowsFromTrigger(
+  triggerExecutionPlan: SparkPlan,
+  triggerLogicalPlan: LogicalPlan,
+  sourceToDataframe: Map[Source, DataFrame]): Map[Source, Long] = {
+
+// We want to associate execution plan leaves to sources that generate 
them, so that we match
+// the their metrics (e.g. numOutputRows) to the sources. To do this 
we do the following.
+// Consider the translation from the streaming logical plan to the 
final executed plan.
+//
+//  streaming logical plan (with sources) <==> trigger's logical plan 
<==> executed plan
+//
+// 1. We keep track of streaming sources associated with each leaf in 
the trigger's logical plan
+//- Each logical plan leaf will be associated with a single 
streaming source.
+//- There can be multiple logical plan leaves associated a 
streaming source.
+//- There can be leaves not associated with any streaming source, 
because they were
+//  generated from a batch source (e.g. stream-batch joins)
+//
+// 2. Assuming that the executed plan has same number of leaves in the 
same order as that of
+//the trigger logical plan, we associate executed plan leaves with 
corresponding
+//streaming sources.
+//
+// 3. For each source, we sum the metrics of the associated execution 
plan leaves.
+//
+val logicalPlanLeafToSource = sourceToDataframe.flatMap { case 
(source, df) =>
+  df.logicalPlan.collectLeaves().map { leaf => leaf -> source }
+}
+val allLogicalPlanLeaves = triggerLogicalPlan.collectLeaves() // 
includes non-streaming sources
+val allExecPlanLeaves = triggerExecutionPlan.collectLeaves()
+if (allLogicalPlanLeaves.size == allExecPlanLeaves.size) {
+  val execLeafToSource = 
allLogicalPlanLeaves.zip(allExecPlanLeaves).flatMap {
+case (lp, ep) => logicalPlanLeafToSource.get(lp).map { source => 
ep -> source }
+  }
+  val sourceToNumInputRows = execLeafToSource.map { case (execLeaf, 
source) =>
+val numRows = 
execLeaf.metrics.get("numOutputRows").map(_.value).getOrElse(0L)
+source -> numRows
+  }
+  sourceToNumInputRows.groupBy(_._1).mapValues(_.map(_._2).sum) // sum 
up rows for each source
+} else {
+  def toString[T](seq: Seq[T]): String = s"(size = ${seq.size}), 
${seq.mkString(", ")}"
+  logWarning(
+"Could not report metrics as number leaves in trigger logical plan 
did not match that" +
--- End diff --

A warning printed once can gets lost in the logs. I think its worth 
printing it every minute or so that if we have to debug its easy to find it, 
rather than trying to look for the logs when the query started. 

Furthermore, I dont want to add a new field flags/timestamp in 
StreamExecution to keep track whether the log has been printed once/last 
minute. So I am thinking of adding a small utility trait that has method 
`logWarningEvery(period, ...)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66355/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

2016-10-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #66355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66355/consoleFull)**
 for PR 15314 at commit 
[`07c156a`](https://github.com/apache/spark/commit/07c156a2b1c3ca60ff1fc4582c9024c333e3a064).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15249#discussion_r81895140
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Clock
+
+/**
+ * Handles blacklisting executors and nodes within a taskset.  This 
includes blacklisting specific
+ * (task, executor) / (task, nodes) pairs, and also completely 
blacklisting executors and nodes
+ * for the entire taskset.
+ *
+ * THREADING:  As a helper to [[TaskSetManager]], this class is designed 
to only be called from code
+ * with a lock on the TaskScheduler (e.g. its event handlers). It should 
not be called from other
+ * threads.
+ */
+private[scheduler] class TaskSetBlacklist(val conf: SparkConf, val 
stageId: Int, val clock: Clock)
+extends Logging {
+
+  private val MAX_TASK_ATTEMPTS_PER_EXECUTOR = 
conf.get(config.MAX_TASK_ATTEMPTS_PER_EXECUTOR)
+  private val MAX_TASK_ATTEMPTS_PER_NODE = 
conf.get(config.MAX_TASK_ATTEMPTS_PER_NODE)
+  private val MAX_FAILURES_PER_EXEC_STAGE = 
conf.get(config.MAX_FAILURES_PER_EXEC_STAGE)
+  private val MAX_FAILED_EXEC_PER_NODE_STAGE = 
conf.get(config.MAX_FAILED_EXEC_PER_NODE_STAGE)
+  private val TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf)
+
+  /**
+   * A map from each executor to the task failures on that executor.
+   */
+  val execToFailures: HashMap[String, ExecutorFailuresInTaskSet] = new 
HashMap()
+
+  /**
+   * Map from node to all executors on it with failures.  Needed because 
we want to know about
+   * executors on a node even after they have died.
+   */
+  private val nodeToExecsWithFailures: HashMap[String, HashSet[String]] = 
new HashMap()
+  private val nodeToBlacklistedTasks: HashMap[String, HashSet[Int]] = new 
HashMap()
+  private val blacklistedExecs: HashSet[String] = new HashSet()
+  private val blacklistedNodes: HashSet[String] = new HashSet()
+
+  /**
+   * Return true if this executor is blacklisted for the given task.  This 
does *not*
+   * need to return true if the executor is blacklisted for the entire 
stage.
+   * That is to keep this method as fast as possible in the inner-loop of 
the
+   * scheduler, where those filters will have already been applied.
+   */
+  def isExecutorBlacklistedForTask(
+  executorId: String,
+  index: Int): Boolean = {
+execToFailures.get(executorId)
+  .map { execFailures =>
+val count = 
execFailures.taskToFailureCountAndExpiryTime.get(index).map(_._1).getOrElse(0)
+count >= MAX_TASK_ATTEMPTS_PER_EXECUTOR
+  }
+  .getOrElse(false)
+  }
+
+  def isNodeBlacklistedForTask(
+  node: String,
+  index: Int): Boolean = {
+nodeToBlacklistedTasks.get(node)
+  .map(_.contains(index))
+  .getOrElse(false)
+  }
+
+  /**
+   * Return true if this executor is blacklisted for the given stage.  
Completely ignores whether
+   * anything to do with the node the executor is on.  That
+   * is to keep this method as fast as possible in the inner-loop of the 
scheduler, where those
+   * filters will already have been applied.
+   */
+  def isExecutorBlacklistedForTaskSet(executorId: String): Boolean = {
+blacklistedExecs.contains(executorId)
+  }
+
+  def isNodeBlacklistedForTaskSet(node: String): Boolean = {
+blacklistedNodes.contains(node)
+  }
--- End diff --

I know its verbose but I'd prefer to keep it.  Especially once 
application-level blacklisting is added 
(https://github.com/apache/spark/pull/14079), there are lots of different

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66362/consoleFull)**
 for PR 15307 at commit 
[`f5732a5`](https://github.com/apache/spark/commit/f5732a50da7f0df326f52ad9b85da3876ecfafbc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #66363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66363/consoleFull)**
 for PR 12135 at commit 
[`a475090`](https://github.com/apache/spark/commit/a475090f5424752a1cfe04983d964f6fb85181b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15355
  
**[Test build #66361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66361/consoleFull)**
 for PR 15355 at commit 
[`b7074d4`](https://github.com/apache/spark/commit/b7074d48159804035eaf00e1abed35e408684b42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING]Disable Kafka 010 pattern based ...

2016-10-04 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15355
  
cc @koeninger any idea why this flaky?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15355: [SPARK-17782][STREAMING]Disable Kafka 010 pattern...

2016-10-04 Thread hvanhovell

GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/15355

[SPARK-17782][STREAMING]Disable Kafka 010 pattern based subscription test.

## What changes were proposed in this pull request?
This PR disables the `pattern based subscription` test in the Kafka's 010 
DirectKafkaStreamSuite. It is behaving flaky.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-17782

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15355


commit b7074d48159804035eaf00e1abed35e408684b42
Author: Herman van Hovell 
Date:   2016-10-05T03:12:08Z

Disable Kafka 010 pattern based subscription test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-10-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #66360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66360/consoleFull)**
 for PR 14452 at commit 
[`cebfbf5`](https://github.com/apache/spark/commit/cebfbf5e3dd7b2d2365e5152991ab7ff2c63dd90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 547 matches

Mail list logo