date:20170314

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/17166
  
The task killed messages should be informative, and I don't think we should 
sacrifice informative messages just so they can be shown concisely in the stage 
summary view.  I think it's much better to have an informative message that a 
user needs to click one link to see than it is to force the user to look in the 
logs to figure out what's going on (especially since behavior around tasks 
getting killed can be very confusing). If others feel strongly I can be 
convinced to add this per-task info to the summary view, but I'm not convinced 
of the need to sacrifice clarify for conciseness.  @mridum @rxin what do you 
two think here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-14 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/17166
  
Drilling down into the detail view is kind of cumbersome -- I think it's 
most useful to have a good summary at the progress bar, and then the user can 
refer to logs for detailed per-task debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-14 Thread ioana-delaney

Github user ioana-delaney commented on a diff in the pull request:

https://github.com/apache/spark/pull/15363#discussion_r106088842
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -51,6 +51,11 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
 
   def reorder(plan: LogicalPlan, output: AttributeSet): LogicalPlan = {
 val (items, conditions) = extractInnerJoins(plan)
+// Find the star schema joins. Currently, it returns the star join 
with the largest
+// fact table. In the future, it can return more than one star join 
(e.g. F1-D1-D2
+// and F2-D3-D4).
+val starJoinPlans = StarSchemaDetection(conf).findStarJoins(items, 
conditions.toSeq)
--- End diff --

@ron8hu We already ran TPC-DS with star schema and the results are 
documented in the design doc. I don't think there is a question about its 
value. 

I am familiar with Pat Selinger's paper since I've been working in the IBM 
DB2 optimizer for several years. What Zhenhua and I are discussing here is how 
to integrate the star join plans with his new DP planning. There are no 
competing planning algorithm that needs to be tested.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17299: [SPARK-19817][SS] Make it clear that `timeZone` i...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17299


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

2017-03-14 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/17299
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17186: [SPARK-19846][SQL] Add a flag to disable constraint prop...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17186
  
**[Test build #74581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74581/testReport)**
 for PR 17186 at commit 
[`d3b0a72`](https://github.com/apache/spark/commit/d3b0a7237c5f70ea64a786ecf63edac13617d284).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/17166
  
Why not just just "X killed" in the stage summary?  It seems like overkill 
to put the reasons for all of the killings there, now that I'm seeing the 
screenshot, since they're already in the detail view (and the whole reason we 
have the detail view is to show per-task info)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15363#discussion_r106086454
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -51,6 +51,11 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
 
   def reorder(plan: LogicalPlan, output: AttributeSet): LogicalPlan = {
 val (items, conditions) = extractInnerJoins(plan)
+// Find the star schema joins. Currently, it returns the star join 
with the largest
+// fact table. In the future, it can return more than one star join 
(e.g. F1-D1-D2
+// and F2-D3-D4).
+val starJoinPlans = StarSchemaDetection(conf).findStarJoins(items, 
conditions.toSeq)
--- End diff --

The dynamic programming solution proposed by Pat was also used in DB2. She 
is the mother of DB2. We knew the limit of this solution. It is unable to solve 
all the issues. [The above 
suggestion](https://github.com/apache/spark/pull/15363#issuecomment-285187051) 
by Ioana (who is the senior DB2 compiler expert) is in the right direction, IMO,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15945
  
**[Test build #74580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74580/testReport)**
 for PR 15945 at commit 
[`11d2757`](https://github.com/apache/spark/commit/11d2757fd1299e2499b2739013cc454392f1a524).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-14 Thread ron8hu

Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/15363#discussion_r106084556
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -51,6 +51,11 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
 
   def reorder(plan: LogicalPlan, output: AttributeSet): LogicalPlan = {
 val (items, conditions) = extractInnerJoins(plan)
+// Find the star schema joins. Currently, it returns the star join 
with the largest
+// fact table. In the future, it can return more than one star join 
(e.g. F1-D1-D2
+// and F2-D3-D4).
+val starJoinPlans = StarSchemaDetection(conf).findStarJoins(items, 
conditions.toSeq)
--- End diff --

As discussed earlier, we only need to perform join reorder algorithm once.

CostBasedJoinReorder implemented Dynamic Programming algorithm published in 
the classic paper
"Access Path Selection in a relational database system" by Patricia 
Selinger.  The same algorithm was used in PostgreSQL.  To my understanding, it 
is a generic algorithm that can work on both star schema and non-star schema.  
For example, it is capable to generate a bushy tree if it is optimal.  That is 
it is not limited to left-deep tree only.

I suggest that we identify the strength of the star join reorder algorithm 
and it can help solve the
deficiency of the dynamic programming algorithm.  Then we add the necessary 
code to address the deficiency.  There is no need to add code that does the 
same job twice without added value.  

Perhaps running TPC-ds benchmark queries and inspecting the generated query 
plan can help us identify the strength and weakness of both algorithms.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17274: [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFi...

2017-03-14 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17274#discussion_r106083228
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -177,6 +177,13 @@ test_that("add and get file to be downloaded with 
Spark job on every node", {
   spark.addFile(path)
   download_path <- spark.getSparkFiles(filename)
   expect_equal(readLines(download_path), words)
+
+  # Test spark.getSparkFiles works well on executors.
+  seq <- seq(from = 1, to = 10, length.out = 5)
+  f <- function(seq) { readLines(spark.getSparkFiles(filename)) }
+  results <- spark.lapply(seq, f)
+  for (i in 1:5) { expect_equal(results[[i]], words) }
+
--- End diff --

* It fails when I run ```run-tests.sh``` on my machine.
* It succeed when I paste these code to SparkR console.
* It succeed when I paste these code to text file, and submit by 
```bin/spark-submit test.R```(local mode) or ```bin/spark-submit --master yarn 
test.R```(yarn mode).

So I think it may caused by the test suites infrastructure, but I'm not 
familiar with that part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17299
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74578/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17299
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17299
  
**[Test build #74578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74578/testReport)**
 for PR 17299 at commit 
[`0b13c08`](https://github.com/apache/spark/commit/0b13c0850472b5172a9f428f923b99a168a79e00).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74576/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74576/testReport)**
 for PR 17166 at commit 
[`72b28cb`](https://github.com/apache/spark/commit/72b28cb0dc7aacf7cde1b5e49f05da49cb5de276).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class KillTask(taskId: Long, executor: String, interruptThread: 
Boolean, reason: String)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15009
  
**[Test build #74579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74579/testReport)**
 for PR 15009 at commit 
[`c17f15f`](https://github.com/apache/spark/commit/c17f15f3994ba0cba4be63519f33ce4429adf489).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74577/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15945
  
**[Test build #74577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74577/testReport)**
 for PR 15945 at commit 
[`870222e`](https://github.com/apache/spark/commit/870222e8ec1b6e7aa32a0260a045192323ba8d30).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15604
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74575/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15604
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15604
  
**[Test build #74575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74575/testReport)**
 for PR 15604 at commit 
[`17e11f0`](https://github.com/apache/spark/commit/17e11f0c56e2a581766c06bd52695c2b05bcfcb2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74572/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74572/testReport)**
 for PR 17166 at commit 
[`31967d1`](https://github.com/apache/spark/commit/31967d185852870d8edecb855ea1aafb7bd04dd1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class KillTask(taskId: Long, executor: String, interruptThread: 
Boolean, reason: String)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-03-14 Thread kishorvpatil

Github user kishorvpatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r106080005
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -252,20 +307,55 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
   handle.getAppId() should startWith ("application_")
   handle.disconnect()
 
+  var applicationId = ConverterUtils.toApplicationId(handle.getAppId)
+  var yarnClient: YarnClient = getYarnClient
   eventually(timeout(30 seconds), interval(100 millis)) {
 handle.getState() should be (SparkAppHandle.State.LOST)
+var status = 
yarnClient.getApplicationReport(applicationId).getFinalApplicationStatus
+status should be (FinalApplicationStatus.KILLED)
   }
+
 } finally {
   handle.kill()
 }
   }
 
-  test("timeout to get SparkContext in cluster mode triggers failure") {
-val timeout = 2000
-val finalState = runSpark(false, 
mainClassName(SparkContextTimeoutApp.getClass),
-  appArgs = Seq((timeout * 4).toString),
-  extraConf = Map(AM_MAX_WAIT_TIME.key -> timeout.toString))
-finalState should be (SparkAppHandle.State.FAILED)
+  test("monitor app using launcher library for thread without auto 
shutdown") {
+val env = new JHashMap[String, String]()
+env.put("YARN_CONF_DIR", hadoopConfDir.getAbsolutePath())
+
+val propsFile = createConfFile()
+val handle = new SparkLauncher(env)
+  .setSparkHome(sys.props("spark.test.home"))
+  .setConf("spark.ui.enabled", "false")
+  .setPropertiesFile(propsFile)
+  .setMaster("yarn")
+  .setDeployMode("cluster")
+  .launchAsThread(true)
+  .setAppResource(SparkLauncher.NO_RESOURCE)
+  .setMainClass(mainClassName(YarnLauncherTestApp.getClass))
+  .startApplication()
+
+try {
+  eventually(timeout(30 seconds), interval(100 millis)) {
+handle.getState() should be (SparkAppHandle.State.RUNNING)
+  }
+
+  handle.getAppId() should not be (null)
+  handle.getAppId() should startWith ("application_")
+  handle.disconnect()
+
+  var applicationId = ConverterUtils.toApplicationId(handle.getAppId)
+  var yarnClient: YarnClient = getYarnClient
+  eventually(timeout(30 seconds), interval(100 millis)) {
+handle.getState() should be (SparkAppHandle.State.LOST)
+var status = 
yarnClient.getApplicationReport(applicationId).getYarnApplicationState
+status should not be (YarnApplicationState.KILLED)
--- End diff --

Checking condition status should `not` be. It could be running of 
successful but not killed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17267
  
LGTM cc @davies @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

2017-03-14 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/17299
  
LGTM, thank you for your follow-up!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17297
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17297
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74566/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17297
  
**[Test build #74566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74566/testReport)**
 for PR 17297 at commit 
[`0bcc69a`](https://github.com/apache/spark/commit/0bcc69a7a3094ddaa8c915be1e4a198a354f8b6b).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TasksAborted(stageId: Int, tasks: Seq[Task[_]]) extends 
DAGSchedulerEvent`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17300: [SPARK-19956][Core]Optimize a location order of blocks w...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17300
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...

2017-03-14 Thread ConeyLiu

GitHub user ConeyLiu opened a pull request:

https://github.com/apache/spark/pull/17300

[SPARK-19956][Core]Optimize a location order of blocks with topology 
information

## What changes were proposed in this pull request?

When call the method getLocations of BlockManager, we only compare the data 
block host. Random selection for non-local data blocks, this may cause the 
selected data block to be in a different rack. So in this patch to increase the 
sort of the rack.

## How was this patch tested?

New test case.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ConeyLiu/spark blockmanager

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17300


commit 926ea2551487f170b24f06a0d6c11879103b05b5
Author: Xianyang Liu 
Date:   2017-03-15T02:42:49Z

optimize a location order of blocks with topology information




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-14 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17267
  
ping @viirya


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74570/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74570/testReport)**
 for PR 17166 at commit 
[`6c90289`](https://github.com/apache/spark/commit/6c902898b7b29f5f32a1618cfbf06e39c8fc3f0f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17232: [SPARK-18112] [SQL] Support reading data from Hiv...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17232


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17232
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17178: [SPARK-19828][R] Support array type in from_json ...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17178


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17178: [SPARK-19828][R] Support array type in from_json in R

2017-03-14 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17178
  
thanks!
merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17268: [SPARK-19932][SS] Also save event time into StateStore f...

2017-03-14 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/17268
  
Thank you @marmbrus for the detailed explanation!

> For that reason, I think its safest to require the user to explicitly 
include the timestamp.

Yea, let me update this in this direction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-03-14 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15125
  
@jkbradley @ankurdave would you like to review?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17297
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17297
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74562/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17297
  
**[Test build #74562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74562/testReport)**
 for PR 17297 at commit 
[`f127150`](https://github.com/apache/spark/commit/f1271506d0f5d5d037cee91cc91d42ddb14a8038).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TasksAborted(stageId: Int, tasks: Seq[Task[_]]) extends 
DAGSchedulerEvent`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17251
  
@cloud-fan . Sorry, but could you review this `stack` PR once again? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-14 Thread ioana-delaney

Github user ioana-delaney commented on a diff in the pull request:

https://github.com/apache/spark/pull/15363#discussion_r106074890
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -51,6 +51,11 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
 
   def reorder(plan: LogicalPlan, output: AttributeSet): LogicalPlan = {
 val (items, conditions) = extractInnerJoins(plan)
+// Find the star schema joins. Currently, it returns the star join 
with the largest
+// fact table. In the future, it can return more than one star join 
(e.g. F1-D1-D2
+// and F2-D3-D4).
+val starJoinPlans = StarSchemaDetection(conf).findStarJoins(items, 
conditions.toSeq)
--- End diff --

@wzhfy Iâve looked into moving the star reordering at the end of the 
optimization phase. Star reordering uses the existing 
```ReorderJoin.createOrderedJoin``` method to construct the final plan once a 
star join is discovered. This method only handles specific types of plans, and 
doesnât recognize the plan layout in the last phase of the Optimizer. Writing 
a new join reordering method for this purpose would not make too much sense 
since star joins are to be used by the existing planning strategies.

I suggest to keep the current logic and, next, I can look into integrating 
the star plans with your new DP planning. Once thatâs tested, we can probably 
remove the star schema call from ```ReorderJoin``` planning rule. Please let me 
know what you think.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17266: [SPARK-19912][SQL] String literals should be escaped for...

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17266
  
For non-error message cases, incorrect result is also a problem in this 
issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17241: [SPARK-19877][SQL] Restrict the nested level of a view

2017-03-14 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/17241
  
ping @hvanhovell @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17232
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74574/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17266: [SPARK-19912][SQL] String literals should be escaped for...

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17266

The following is the error message. Since we are not escaping in the spark
master, the behavior (incorrect filtering or the error message) is the same
from the master branch Spark.
```
java.lang.RuntimeException: Caught Hive MetaException attempting to get
partition metadata by filter from Hive. You can set the Spark configuration
setting spark.sql.hive.manageFilesourcePartitions to false to work around this
problem, however this will result in degraded performance. Please report a bug:
https://issues.apache.org/jira/browse/SPARK
at
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:612)
...
Caused by: java.lang.reflect.InvocationTargetException:
org.apache.hadoop.hive.metastore.api.MetaException: Error parsing partition
filter : line 1:8 mismatched character '' expecting '"'
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
at
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:599)
... 103 more
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Error
parsing partition filter : line 1:8 mismatched character '' expecting '"'
at
org.apache.hadoop.hive.metastore.ObjectStore.getFilterParser(ObjectStore.java:2569)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:2512)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:2335)
...

org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:4442)
...

org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1103)
...
at
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2254)
... 108 more
```

HIVE-11723 seems to resove that in SemanticAnalyzer. So, I need to try that
soon.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17232
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17232
  
**[Test build #74574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74574/testReport)**
 for PR 17232 at commit 
[`80f33da`](https://github.com/apache/spark/commit/80f33da13dd6e3bd9820ab6fdd641404f0ad2a0b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17255


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17277: [SPARK-19887][SQL] dynamic partition keys can be ...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17277


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17255: [SPARK-19918][SQL] Use TextFileFormat in implementation ...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17255
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17255#discussion_r106072933
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
@@ -40,18 +40,11 @@ private[sql] object JsonInferSchema {
   json: RDD[T],
   configOptions: JSONOptions,
   createParser: (JsonFactory, T) => JsonParser): StructType = {
-require(configOptions.samplingRatio > 0,
-  s"samplingRatio (${configOptions.samplingRatio}) should be greater 
than 0")
 val shouldHandleCorruptRecord = configOptions.permissive
 val columnNameOfCorruptRecord = configOptions.columnNameOfCorruptRecord
-val schemaData = if (configOptions.samplingRatio > 0.99) {
-  json
-} else {
-  json.sample(withReplacement = false, configOptions.samplingRatio, 1)
-}
--- End diff --

I think it's fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17232
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17299
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17299
  
**[Test build #74578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74578/testReport)**
 for PR 17299 at commit 
[`0b13c08`](https://github.com/apache/spark/commit/0b13c0850472b5172a9f428f923b99a168a79e00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

2017-03-14 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/17299
  
This is the fix for the streaming counter-part (i.e. Structured Streaming), 
@ueshin @gatorsmile would you take a look? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17299: [SPARK-19817][SS] Make it clear that `timeZone` i...

2017-03-14 Thread lw-lin

GitHub user lw-lin opened a pull request:

https://github.com/apache/spark/pull/17299

[SPARK-19817][SS] Make it clear that `timeZone` is a general option in 
DataStreamReader/Writer

## What changes were proposed in this pull request?

As timezone setting can also affect partition values, it works for all 
formats, we should make it clear.

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark timezone

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17299


commit 0b13c0850472b5172a9f428f923b99a168a79e00
Author: Liwei Lin 
Date:   2017-03-15T02:06:06Z

Initial commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17298
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74568/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17298
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17164: [SPARK-16844][SQL] Support codegen for sort-based aggrea...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17164
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17298
  
**[Test build #74568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74568/testReport)**
 for PR 17298 at commit 
[`15d999b`](https://github.com/apache/spark/commit/15d999bb901aa0a0eef73ff50f2ba3d24c4d3f72).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17164: [SPARK-16844][SQL] Support codegen for sort-based aggrea...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17164
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74569/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17164: [SPARK-16844][SQL] Support codegen for sort-based aggrea...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17164
  
**[Test build #74569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74569/testReport)**
 for PR 17164 at commit 
[`5baa928`](https://github.com/apache/spark/commit/5baa928d758eaf4c6711c4a8d67611995ca3af25).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait AggregateCodegenHelper `
  * `abstract class AggregateExec extends UnaryExecNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-14 Thread VinceShieh

Github user VinceShieh commented on the issue:

https://github.com/apache/spark/pull/17237
  
Sure. No problem!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17223: [SPARK-19881][SQL] Support Dynamic Partition Inse...

Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/17223


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17223
  
I'll close this PR and JIRA issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17223
  
I see. That's the reason why not to support that. Thank you, @cloud-fan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17266: [SPARK-19912][SQL] String literals should be escaped for...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17266
  
can we say something more in the error message? We should explain that it's 
a hive bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15945
  
**[Test build #74577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74577/testReport)**
 for PR 15945 at commit 
[`870222e`](https://github.com/apache/spark/commit/870222e8ec1b6e7aa32a0260a045192323ba8d30).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-03-14 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/12436
  
@jisookim0513 - created a new PR - 
https://github.com/apache/spark/pull/17297


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74576/testReport)**
 for PR 17166 at commit 
[`72b28cb`](https://github.com/apache/spark/commit/72b28cb0dc7aacf7cde1b5e49f05da49cb5de276).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17177
  
@ep1804 @jbax Thank you. I will cc and inform you both when I happen to see 
a PR bumping up the version to 2.4.0 (or probably I guess I will). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-14 Thread ep1804

Github user ep1804 closed the pull request at:

https://github.com/apache/spark/pull/17177


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-14 Thread ep1804

Github user ep1804 commented on the issue:

https://github.com/apache/spark/pull/17177
  
I agree with you @HyukjinKwon , this PR will be closed for now and re-open.

And, thank you for the notice @jbox !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-14 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r106066177
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -710,7 +710,11 @@ private[spark] class TaskSetManager(
   logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task 
${attemptInfo.id} " +
 s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on 
${attemptInfo.host} " +
 s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
-  sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, 
true)
+  sched.backend.killTask(
+attemptInfo.taskId,
+attemptInfo.executorId,
+interruptThread = true,
+reason = "another attempt succeeded")
--- End diff --

I added two screenshots to the PR description. In the second scenario 
having a verbose reason is fine, but in the stage summary view long or many 
distinct reasons would overflow the progress bar.

We could probably fix the css to allow slightly longer / more reasons, but 
even that wouldn't be great if each task had a different reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-14 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106065119
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -76,468 +118,499 @@ class SessionCatalogSuite extends PlanTest {
   }
 
   test("create databases using invalid names") {
-val catalog = new SessionCatalog(newEmptyCatalog())
-testInvalidName(name => catalog.createDatabase(newDb(name), 
ignoreIfExists = true))
+withSessionCatalog(EMPTY) { catalog =>
+  testInvalidName(
+name => catalog.createDatabase(newDb(name), ignoreIfExists = true))
+}
   }
 
   test("get database when a database exists") {
-val catalog = new SessionCatalog(newBasicCatalog())
-val db1 = catalog.getDatabaseMetadata("db1")
-assert(db1.name == "db1")
-assert(db1.description.contains("db1"))
+withSessionCatalog() { catalog =>
+  val db1 = catalog.getDatabaseMetadata("db1")
+  assert(db1.name == "db1")
+  assert(db1.description.contains("db1"))
+}
   }
 
   test("get database should throw exception when the database does not 
exist") {
-val catalog = new SessionCatalog(newBasicCatalog())
-intercept[NoSuchDatabaseException] {
-  catalog.getDatabaseMetadata("db_that_does_not_exist")
+withSessionCatalog() { catalog =>
+  intercept[NoSuchDatabaseException] {
+catalog.getDatabaseMetadata("db_that_does_not_exist")
+  }
 }
   }
 
   test("list databases without pattern") {
-val catalog = new SessionCatalog(newBasicCatalog())
-assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", 
"db3"))
+withSessionCatalog() { catalog =>
+  assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", 
"db3"))
+}
   }
 
   test("list databases with pattern") {
-val catalog = new SessionCatalog(newBasicCatalog())
-assert(catalog.listDatabases("db").toSet == Set.empty)
-assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3"))
-assert(catalog.listDatabases("*1").toSet == Set("db1"))
-assert(catalog.listDatabases("db2").toSet == Set("db2"))
+withSessionCatalog() { catalog =>
+  assert(catalog.listDatabases("db").toSet == Set.empty)
+  assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", 
"db3"))
+  assert(catalog.listDatabases("*1").toSet == Set("db1"))
+  assert(catalog.listDatabases("db2").toSet == Set("db2"))
+}
   }
 
   test("drop database") {
-val catalog = new SessionCatalog(newBasicCatalog())
-catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false)
-assert(catalog.listDatabases().toSet == Set("default", "db2", "db3"))
+withSessionCatalog() { catalog =>
+  catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = 
false)
+  assert(catalog.listDatabases().toSet == Set("default", "db2", "db3"))
+}
   }
 
   test("drop database when the database is not empty") {
 // Throw exception if there are functions left
-val externalCatalog1 = newBasicCatalog()
-val sessionCatalog1 = new SessionCatalog(externalCatalog1)
-externalCatalog1.dropTable("db2", "tbl1", ignoreIfNotExists = false, 
purge = false)
-externalCatalog1.dropTable("db2", "tbl2", ignoreIfNotExists = false, 
purge = false)
-intercept[AnalysisException] {
-  sessionCatalog1.dropDatabase("db2", ignoreIfNotExists = false, 
cascade = false)
+withSessionCatalogAndExternal() { (catalog, externalCatalog) =>
+  externalCatalog.dropTable("db2", "tbl1", ignoreIfNotExists = false, 
purge = false)
+  externalCatalog.dropTable("db2", "tbl2", ignoreIfNotExists = false, 
purge = false)
+  intercept[AnalysisException] {
+catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = 
false)
+  }
 }
-
-// Throw exception if there are tables left
-val externalCatalog2 = newBasicCatalog()
-val sessionCatalog2 = new SessionCatalog(externalCatalog2)
-externalCatalog2.dropFunction("db2", "func1")
-intercept[AnalysisException] {
-  sessionCatalog2.dropDatabase("db2", ignoreIfNotExists = false, 
cascade = false)
+withSessionCatalogAndExternal() { (catalog, externalCatalog) =>
+  // Throw exception if there are tables left
+  externalCatalog.dropFunction("db2", "func1")
+  intercept[AnalysisException] {
+catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = 
false)
+  }
 }
 
-// When cascade is true, it should drop them
-val externalCatalog3 =

[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15604
  
**[Test build #74575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74575/testReport)**
 for PR 15604 at commit 
[`17e11f0`](https://github.com/apache/spark/commit/17e11f0c56e2a581766c06bd52695c2b05bcfcb2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/15604
  
@erenavsarogullari please file a JIRA when you see test failures instead of 
ignoring them.  I updated https://issues.apache.org/jira/browse/SPARK-19803 for 
the first failure, but please file a JIRA for the second one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/15604
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15945
  
**[Test build #74573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74573/testReport)**
 for PR 15945 at commit 
[`ea586cf`](https://github.com/apache/spark/commit/ea586cf6fb4464101c22ac98c4a5f5e08dfc5dbf).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74573/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17232
  
**[Test build #74574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74574/testReport)**
 for PR 17232 at commit 
[`80f33da`](https://github.com/apache/spark/commit/80f33da13dd6e3bd9820ab6fdd641404f0ad2a0b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17223
  
Since hive client is shared among all sessions, we can't set hive conf 
dynamically, to keep session isolation. I think we should treat hive conf as 
static sql conf, and throw exception when users try to change them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15945
  
**[Test build #74573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74573/testReport)**
 for PR 15945 at commit 
[`ea586cf`](https://github.com/apache/spark/commit/ea586cf6fb4464101c22ac98c4a5f5e08dfc5dbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15945
  
**[Test build #74571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74571/testReport)**
 for PR 15945 at commit 
[`8e5d522`](https://github.com/apache/spark/commit/8e5d5226f7bb9bdf32cf93742f80fd44052f085e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15945
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74571/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74572/testReport)**
 for PR 17166 at commit 
[`31967d1`](https://github.com/apache/spark/commit/31967d185852870d8edecb855ea1aafb7bd04dd1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16788: [SPARK-16742] Kerberos impersonation support

2017-03-14 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/16788
  
>Trying to put it differently: if Spark had its own, secure method for 
distributing the initial set of delegation tokens needed by the executors (+ AM 
in case of YARN), then the YARN backend wouldn't need to use 
amContainer.setTokens at all. What I'm suggesting here is that this method be 
the base of the Mesos / Kerberos integration; and later we could change YARN to 
also use it.

>This particular code is pretty self-contained and is the base of what you 
need here to bootstrap things. Moving it to "core" wouldn't be that hard, I 
think. The main thing would be to work on how the initial set of tokens is sent 
to executors, since that is the only thing YARN does for Spark right now.

Agreed, I'm also thinking about it, the main thing currently only Spark on 
YARN can support DT (delegation token) is that yarn could help propagate DTs in 
bootstrapping. If Spark has a common solution for this, then Spark could 
support accessing kerberized services under different cluster manages. One 
simple way as I prototyped before is to pass serialized credentials as executor 
launch command argument, then when executor launched, deserialize the 
credential and set to UGI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r106063119
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -710,7 +710,11 @@ private[spark] class TaskSetManager(
   logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task 
${attemptInfo.id} " +
 s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on 
${attemptInfo.host} " +
 s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
-  sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, 
true)
+  sched.backend.killTask(
+attemptInfo.taskId,
+attemptInfo.executorId,
+interruptThread = true,
+reason = "another attempt succeeded")
--- End diff --

Can you post a screenshot of the relevant part of the UI? Is the problem 
just that the HTML properties don't allow columns to wrap?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...