[GitHub] spark issue #17368: [SPARK-20039][ML] rename ChiSquare to ChiSquareTest

2017-03-21 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17368
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14694: [SPARK-17121][SPARKSUBMIT] Support _HOST replacement for...

2017-03-21 Thread wolf31o2
Github user wolf31o2 commented on the issue:

https://github.com/apache/spark/pull/14694
  
This is useful for the Spark HistoryServer, especially if it's configured 
to store history in HDFS. That's where I've run into this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-21 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17324
  
Will take a look this week


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14731
  
**[Test build #74990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)**
 for PR 14731 at commit 
[`a3aaf26`](https://github.com/apache/spark/commit/a3aaf267d2ac30c012b4a71b7a80e28a49ff10be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-03-21 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16499
  
> @mallman can you send a new PR for 2.0? thanks!

Will do. Do I need to open a new JIRA ticket for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-21 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/17364
  
looking some more, yes,  as `tryWithSafeFinallyAndFailureCallbacks` wraps 
task commit, it guarantees that the original cause doesn't get lost. The 
abortJob code isn't so well guarded, and looks like a failure there my hide a 
previous one (like a commitJob failure).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17295
  
**[Test build #74991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74991/testReport)**
 for PR 17295 at commit 
[`107e3e7`](https://github.com/apache/spark/commit/107e3e72e81d2c7813d832d3e9c2beab89e01379).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-03-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16499
  
You do not need to open the new JIRA. You can still use the same JIRA number


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17368: [SPARK-20039][ML] rename ChiSquare to ChiSquareTest

2017-03-21 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/17368
  
Yep, thanks for confirming that @srowen  and checking it out @imatiach-msft 
and @MLnick !

Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17368: [SPARK-20039][ML] rename ChiSquare to ChiSquareTe...

2017-03-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17368


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17302
  
**[Test build #74988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74988/testReport)**
 for PR 17302 at commit 
[`d7d0a36`](https://github.com/apache/spark/commit/d7d0a36f6b4fb78cc0a3a13f870a41b03adf882f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17302
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74988/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17302
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17350
  
**[Test build #74986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74986/testReport)**
 for PR 17350 at commit 
[`1260ef7`](https://github.com/apache/spark/commit/1260ef7baf3382fe3009302f37462e82d3550bb2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17350
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17350
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74986/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/17302
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14731
  
**[Test build #74990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)**
 for PR 14731 at commit 
[`a3aaf26`](https://github.com/apache/spark/commit/a3aaf267d2ac30c012b4a71b7a80e28a49ff10be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14731
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread ericl
Github user ericl commented on the issue:

https://github.com/apache/spark/pull/17166
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14731
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74990/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17302
  
**[Test build #74992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74992/testReport)**
 for PR 17302 at commit 
[`d7d0a36`](https://github.com/apache/spark/commit/d7d0a36f6b4fb78cc0a3a13f870a41b03adf882f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74993/testReport)**
 for PR 17166 at commit 
[`884a3ad`](https://github.com/apache/spark/commit/884a3ad7308e69c0ca010c344133bcce6582920d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

2017-03-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17377#discussion_r107235501
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -369,10 +369,10 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
 :param maxCharsPerColumn: defines the maximum number of characters 
allowed for any given
   value being read. If None is set, it 
uses the default value,
   ``-1`` meaning unlimited length.
-:param maxMalformedLogPerPartition: sets the maximum number of 
malformed rows Spark will
-log for each partition. 
Malformed records beyond this
-number will be ignored. If 
None is set, it
-uses the default value, ``10``.
+:param maxMalformedLogPerPartition: previously sets the maximum 
number of malformed rows
+Spark will log. However, it 
does not log them after
+2.2.0. This parameter exists 
only for backwards
+compatibility for positional 
arguments.
--- End diff --

Let us simplify it to 
> This parameter is no longer used since Spark 2.2.0. If specified, it is 
ignored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

2017-03-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17377#discussion_r107235659
  
--- Diff: python/pyspark/sql/streaming.py ---
@@ -625,6 +625,10 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
 :param maxCharsPerColumn: defines the maximum number of characters 
allowed for any given
   value being read. If None is set, it 
uses the default value,
   ``-1`` meaning unlimited length.
+:param maxMalformedLogPerPartition: previously sets the maximum 
number of malformed rows
+Spark will log. However, it 
does not log them after
+2.2.0. This parameter exists 
only for backwards
+compatibility for positional 
arguments.
--- End diff --

The same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...

2017-03-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17350
  
LGTM

Since it is very close to code freeze, let me merge it to master and 2.1 at 
first. You can submit the PR to address the issues as a follow-up PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17350: [SPARK-20017][SQL] change the nullability of func...

2017-03-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17350


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107233146
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -302,12 +298,12 @@ private[spark] class Executor(
 
 // If this task has been killed before we deserialized it, let's 
quit now. Otherwise,
 // continue executing the task.
-if (killed) {
+if (maybeKillReason.isDefined) {
   // Throw an exception rather than returning, because returning 
within a try{} block
   // causes a NonLocalReturnControl exception to be thrown. The 
NonLocalReturnControl
   // exception will be caught by the catch block, leading to an 
incorrect ExceptionFailure
   // for the task.
-  throw new TaskKilledException
+  throw new TaskKilledException(maybeKillReason.get)
--- End diff --

Same as above here - atomic use of `maybeKillReason` required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107239694
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl 
private[scheduler](
   taskState: TaskState,
   reason: TaskFailedReason): Unit = synchronized {
 taskSetManager.handleFailedTask(tid, taskState, reason)
-if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
+if (!taskSetManager.isZombie) {
--- End diff --

@ericl Actually that is not correct.
Killed tasks were not candidates for resubmission on failure; and hence 
there is no need to revive offers when task kills are detected.

If they are to be made candidates, we need to introduce this expectation 
explicit elsewhere also to be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107235395
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -569,8 +575,10 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   Thread.sleep(999)
 }
 // second attempt succeeds immediately
+SparkContextSuite.taskSucceeded = true
   }
 }
+assert(SparkContextSuite.taskSucceeded)
--- End diff --

Both listener and the task are both setting taskSuceeded ? That does not 
look right ...
I am assuming we need one failure to be raised with the appropriate 
message, one task success - to ensure listener success.
Additionally, re-execution of task to indicate success of task (though this 
aspect should be covered in some other test already).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107234055
  
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -59,8 +59,8 @@ private[spark] class TaskContextImpl(
   /** List of callback functions to execute when the task fails. */
   @transient private val onFailureCallbacks = new 
ArrayBuffer[TaskFailureListener]
 
-  // Whether the corresponding task has been killed.
-  @volatile private var interrupted: Boolean = false
+  // If defined, the corresponding task has been killed for the contained 
reason.
+  @volatile private var maybeKillReason: Option[String] = None
--- End diff --

nit: Overloading `maybeKillReason` to indicate  `interrupted` status smells 
a bit; but might be ok for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107234445
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
 
   // A flag to indicate whether the task is killed. This is used in case 
context is not yet
   // initialized when kill() is invoked.
-  @volatile @transient private var _killed = false
+  @volatile @transient private var _maybeKillReason: String = null
--- End diff --

Any reason to make this a String and not Option[String] - like other places 
it is defined/used ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107237325
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -215,7 +215,8 @@ private[spark] class PythonRunner(
 
   case e: Exception if context.isInterrupted =>
 logDebug("Exception thrown after task interruption", e)
-throw new TaskKilledException
+context.killTaskIfInterrupted()
+null  // not reached
--- End diff --

nit: It would be good if we could directly throw the exception here - 
instead of relying on killTaskIfInterrupted to do the right thing (it is 
interrupted already according to the case check)
Not only will it not remove the unreachable `null`, but also ensure future 
changes to `killTaskIfInterrupted` or interrupt reset, etc does not break this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107235703
  
--- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
@@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason {
  * Task was killed intentionally and needs to be rescheduled.
  */
 @DeveloperApi
-case object TaskKilled extends TaskFailedReason {
-  override def toErrorString: String = "TaskKilled (killed intentionally)"
+case class TaskKilled(reason: String) extends TaskFailedReason {
+  override def toErrorString: String = s"TaskKilled ($reason)"
--- End diff --

That is unfortunate, but looks like it cant be helped if we need this 
feature.
Probably something to keep in mind with future use of case objects !

Thx for clarifying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107232894
  
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -140,16 +140,22 @@ private[spark] class TaskContextImpl(
   }
 
   /** Marks the task for interruption, i.e. cancellation. */
-  private[spark] def markInterrupted(): Unit = {
-interrupted = true
+  private[spark] def markInterrupted(reason: String): Unit = {
+maybeKillReason = Some(reason)
+  }
+
+  private[spark] override def killTaskIfInterrupted(): Unit = {
+if (maybeKillReason.isDefined) {
+  throw new TaskKilledException(maybeKillReason.get)
--- End diff --

This is not thread safe - while technically we do not allow kill reason to 
be reset to None right now and might be fine, it can lead to future issues.

Either make all access/updates to kill reason synchronized; or capture 
`maybeKillReason` to a local variable and use that in the `if` and `throw`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...

2017-03-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17350
  
@zhaorongsheng Not sure whether you can help us check whether all the 
functions have an issue in nullability setting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17130
  
**[Test build #74994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74994/testReport)**
 for PR 17130 at commit 
[`9fef280`](https://github.com/apache/spark/commit/9fef280751378dbeaa843c673fd962192320a5b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17290: [SPARK-16599][CORE] java.util.NoSuchElementException: No...

2017-03-21 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17290
  
I agree with @srowen I dont see how this change affects the test. 
`blocksWithReleasedLocks` should be unchanged w.r.t this test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

2017-03-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17377#discussion_r107243921
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala 
---
@@ -17,25 +17,35 @@
 
 package org.apache.spark.sql.catalyst.util
 
-object ParseModes {
-  val PERMISSIVE_MODE = "PERMISSIVE"
-  val DROP_MALFORMED_MODE = "DROPMALFORMED"
-  val FAIL_FAST_MODE = "FAILFAST"
+import org.apache.spark.internal.Logging
 
-  val DEFAULT = PERMISSIVE_MODE
+object ParseMode extends Enumeration with Logging {
--- End diff --

Not sure whether we should use JAVA Enum instead. cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

2017-03-21 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17360
  
@yanboliang Thanks for your feedback! The design of the optimizer 
interface, or even whether it should be included at all, is definitely open for 
discussion and your suggestions are much appreciated. If SPARK-17136 proceeds 
as you suggest (internal optimization API that allows users to register 
optimizers) then it is possible that this PR does not conflict with that JIRA 
(though I don't know about the details of that, so even that I'm not sure of). 
However, that matter is far from settled. If we end up deciding to provide the 
external optimizer API as is currently suggested in that JIRA, then these two 
_do_ conflict. If we add the ability to specify parameter bounds on the 
estimator, then add an optimizer API, we have added yet more optimizer 
parameters to the estimator that can conflict with parameters of the optimizer 
provided to the estimator.

My point is that I think these are two competing approaches and we should 
settle on one over the other before we make API changes that cannot be undone. 
I'm open to potentially changing the design of SPARK-17136, but we need to 
decide on something first. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

2017-03-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17377
  
So far, the documentation of these data source options are missing. In the 
last release, we clean up the [JDBC 
options](http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases)
 in the documentation. Do you think you have the bandwidth to do it for csv and 
json?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17336
  
**[Test build #74995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74995/testReport)**
 for PR 17336 at commit 
[`9c046c3`](https://github.com/apache/spark/commit/9c046c3bb8dfd6dd0fa2799d434a4f92cbb1b802).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17378: [SPARK-20046][SQL] Facilitate loop optimizations ...

2017-03-21 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/17378

[SPARK-20046][SQL] Facilitate loop optimizations in a JIT compiler 
regarding sqlContext.read.parquet()

## What changes were proposed in this pull request?

This PR improves performance of operations with `sqlContext.read.parquet()` 
by changing Java code generated by Catalyst. This PR is inspired by [the blog 
article](https://databricks.com/blog/2017/02/16/processing-trillion-rows-per-second-single-machine-can-nested-loop-joins-fast.html)
 and [this stackoverflow 
entry](http://stackoverflow.com/questions/40629435/fast-parquet-row-count-in-spark).

This PR changes generated code in the following two points.
1. Replace a while-loop with long instance variables a for-loop with int 
local variables
2. Suppress generation of `shouldStop()` method if this method is 
unnecessary (e.g. `append()` is not generated).

These points facilitates compiler optimizations in a JIT compiler by 
feeding the simplified Java code into the JIT compiler. The performance of 
`sqlContext.read.parquet().count` is improved by 1.09x.

Benchmark program:
```java
val dir = "/dev/shm/parquet"
val N = 1000 * 1000 * 40
val iters = 20
val benchmark = new Benchmark("Parquet", N * iters, minNumIters = 5, 
warmupTime = 30.seconds)
sparkSession.range(n).write.mode("overwrite").parquet(dir)

benchmark.addCase("count") { i: Int =>
  var n = 0
  var len = 0L
  while (n < iters) {
len += sparkSession.read.parquet(dir).count
n += 1
  }
}
benchmark.run
```

Performance result without this PR
```
OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-47-generic
Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
Parquet: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


w/o this PR   1152 / 1211694.7  
 1.4   1.0X
```

Performance result with this PR
```
OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-47-generic
Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
Parquet: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


with this PR  1053 / 1121760.0  
 1.3   1.0X
```

Here is a comparison between generated code w/o and with this PR. Only the 
method ```agg_doAggregateWithoutKey``` is changed.

Generated code without this PR
```java
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private boolean agg_initAgg;
/* 009 */   private boolean agg_bufIsNull;
/* 010 */   private long agg_bufValue;
/* 011 */   private scala.collection.Iterator scan_input;
/* 012 */   private org.apache.spark.sql.execution.metric.SQLMetric 
scan_numOutputRows;
/* 013 */   private org.apache.spark.sql.execution.metric.SQLMetric 
scan_scanTime;
/* 014 */   private long scan_scanTime1;
/* 015 */   private org.apache.spark.sql.execution.vectorized.ColumnarBatch 
scan_batch;
/* 016 */   private int scan_batchIdx;
/* 017 */   private org.apache.spark.sql.execution.metric.SQLMetric 
agg_numOutputRows;
/* 018 */   private org.apache.spark.sql.execution.metric.SQLMetric 
agg_aggTime;
/* 019 */   private UnsafeRow agg_result;
/* 020 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
/* 021 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter;
/* 022 */
/* 023 */   public GeneratedIterator(Object[] references) {
/* 024 */ this.references = references;
/* 025 */   }
/* 026 */
/* 027 */   public void init(int index, scala.collection.Iterator[] inputs) 
{
/* 028 */ partitionIndex = index;
/* 029 */ this.inputs = inputs;
/* 030 */ agg_initAgg = false;
/* 031 */
/* 032 */ scan_input = inputs[0];
/* 033 */ this.scan_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[0];
/* 034 */ this.scan_scanTime = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[1];
/* 035 */ scan_scanTime1 = 0;
/* 036 */ scan_batch = null;
/* 037 */ scan_batchIdx = 0;
/* 038 */ this.agg_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[2];
/* 039 */ this.agg_aggTime = 
(org.apache.spark.sql.execution.metric.SQLMetric) r

[GitHub] spark issue #17378: [SPARK-20046][SQL] Facilitate loop optimizations in a JI...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17378
  
**[Test build #74996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74996/testReport)**
 for PR 17378 at commit 
[`d74b6cf`](https://github.com/apache/spark/commit/d74b6cf5fb63479040e940e5797e0b226367b227).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17130
  
**[Test build #74994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74994/testReport)**
 for PR 17130 at commit 
[`9fef280`](https://github.com/apache/spark/commit/9fef280751378dbeaa843c673fd962192320a5b1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74994/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17130
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

2017-03-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17377
  
Definitely. Thanks for asking it. Let me open another PR soon for both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17295
  
**[Test build #74991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74991/testReport)**
 for PR 17295 at commit 
[`107e3e7`](https://github.com/apache/spark/commit/107e3e72e81d2c7813d832d3e9c2beab89e01379).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17295
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74991/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17295
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17336
  
**[Test build #74995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74995/testReport)**
 for PR 17336 at commit 
[`9c046c3`](https://github.com/apache/spark/commit/9c046c3bb8dfd6dd0fa2799d434a4f92cbb1b802).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17336
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17336
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74995/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17295
  
**[Test build #74997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74997/testReport)**
 for PR 17295 at commit 
[`6bda670`](https://github.com/apache/spark/commit/6bda6701bf0c266047a5fa81fd29f4fb826728c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17371: [SPARK-19903][PYSPARK][SS] window operator miss the `wat...

2017-03-21 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/17371
  
I really think the core problem here is that we allow you to use resolved 
attributes at all in the user API.  Unfortunately we are somewhat stuck with 
that bad decision.  Personally, I never use `df['col']` and only ever use 
`col("col")` since that avoids the problem.

However, I don't think that piecemeal switching to unresolved attributes is 
a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74993/testReport)**
 for PR 17166 at commit 
[`884a3ad`](https://github.com/apache/spark/commit/884a3ad7308e69c0ca010c344133bcce6582920d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TaskKilledException(val reason: String) extends RuntimeException 
`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-03-21 Thread ajbozarth
Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/14617
  
After reading through the previous comments I agree adding checkboxes to 
this page is a good idea, I would even suggest that we look at making 
checkboxes for a few of the current columns (default to show, to keep user 
compatibility)> I'm not sure which would be best but I know on many apps a few 
columns are never filled (Disk usage, and shuffle read/write first come to 
mind).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17302
  
**[Test build #74992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74992/testReport)**
 for PR 17302 at commit 
[`d7d0a36`](https://github.com/apache/spark/commit/d7d0a36f6b4fb78cc0a3a13f870a41b03adf882f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74993/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17302
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74992/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17302
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17166
  
Hi @kayousterhout,
  Can you take over reviewing this PR ? I might be tied up with other 
things for next couple of weeks, and I dont want @ericl's work to be blocked on 
me.

Thx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107271185
  
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -59,8 +59,8 @@ private[spark] class TaskContextImpl(
   /** List of callback functions to execute when the task fails. */
   @transient private val onFailureCallbacks = new 
ArrayBuffer[TaskFailureListener]
 
-  // Whether the corresponding task has been killed.
-  @volatile private var interrupted: Boolean = false
+  // If defined, the corresponding task has been killed for the contained 
reason.
+  @volatile private var maybeKillReason: Option[String] = None
--- End diff --

Yeah, the reason here is to allow this to be set atomically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107273296
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -302,12 +298,12 @@ private[spark] class Executor(
 
 // If this task has been killed before we deserialized it, let's 
quit now. Otherwise,
 // continue executing the task.
-if (killed) {
+if (maybeKillReason.isDefined) {
   // Throw an exception rather than returning, because returning 
within a try{} block
   // causes a NonLocalReturnControl exception to be thrown. The 
NonLocalReturnControl
   // exception will be caught by the catch block, leading to an 
incorrect ExceptionFailure
   // for the task.
-  throw new TaskKilledException
+  throw new TaskKilledException(maybeKillReason.get)
--- End diff --

Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107272852
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -215,7 +215,8 @@ private[spark] class PythonRunner(
 
   case e: Exception if context.isInterrupted =>
 logDebug("Exception thrown after task interruption", e)
-throw new TaskKilledException
+context.killTaskIfInterrupted()
+null  // not reached
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107274054
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl 
private[scheduler](
   taskState: TaskState,
   reason: TaskFailedReason): Unit = synchronized {
 taskSetManager.handleFailedTask(tid, taskState, reason)
-if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
+if (!taskSetManager.isZombie) {
--- End diff --

There is no need, but reviving offers has no effect either way. Those tasks 
will not be resubmitted even if reviveOffers() is called (in fact, 
reviveOffers() is called periodically on a timer thread, so if this was an 
issue we should have already seen it).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107271262
  
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -140,16 +140,22 @@ private[spark] class TaskContextImpl(
   }
 
   /** Marks the task for interruption, i.e. cancellation. */
-  private[spark] def markInterrupted(): Unit = {
-interrupted = true
+  private[spark] def markInterrupted(reason: String): Unit = {
+maybeKillReason = Some(reason)
+  }
+
+  private[spark] override def killTaskIfInterrupted(): Unit = {
+if (maybeKillReason.isDefined) {
+  throw new TaskKilledException(maybeKillReason.get)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107273498
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
 
   // A flag to indicate whether the task is killed. This is used in case 
context is not yet
   // initialized when kill() is invoked.
-  @volatile @transient private var _killed = false
+  @volatile @transient private var _maybeKillReason: String = null
--- End diff --

This one gets deserialized to null sometimes, so it seemed cleaner to use a 
bare string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-21 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17355
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74999/testReport)**
 for PR 17166 at commit 
[`203a900`](https://github.com/apache/spark/commit/203a90020031b71d976f60491d757c4d78b85517).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17355
  
**[Test build #74998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74998/testReport)**
 for PR 17355 at commit 
[`267837c`](https://github.com/apache/spark/commit/267837cd741b9a1d50842e485c20033aa9b77f8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #75000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75000/testReport)**
 for PR 17166 at commit 
[`6e8593b`](https://github.com/apache/spark/commit/6e8593b9bb88a2b0bf90e39887368cc4535480b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16596: [SPARK-19237][SPARKR][CORE] On Windows spark-submit shou...

2017-03-21 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16596
  
Ah we are only changing this on Windows - I see. This is a lower risk 
change then. LGTM. Merging this to master, branch-2.1

cc @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17361
  
**[Test build #75001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75001/testReport)**
 for PR 17361 at commit 
[`6759165`](https://github.com/apache/spark/commit/6759165f9b6d26c87b94e7acc40914ae4ca37a89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17379: [SPARK-20048][SQL] Cloning SessionState does not ...

2017-03-21 Thread kunalkhamar
GitHub user kunalkhamar opened a pull request:

https://github.com/apache/spark/pull/17379

[SPARK-20048][SQL] Cloning SessionState does not clone query execution 
listeners

## What changes were proposed in this pull request?

Bugfix from SPARK-19540.
Cloning SessionState does not clone query execution listeners, so cloned 
session is unable to listen to events on queries.

## How was this patch tested?

- Unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunalkhamar/spark clone-bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17379.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17379


commit ad77fe9ad258eac224f069bbc89294818ee6b549
Author: Kunal Khamar 
Date:   2017-03-21T21:16:04Z

Fix cloning of listener manager. Remove redundant comments.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74999/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74999/testReport)**
 for PR 17166 at commit 
[`203a900`](https://github.com/apache/spark/commit/203a90020031b71d976f60491d757c4d78b85517).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17379
  
**[Test build #75002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75002/testReport)**
 for PR 17379 at commit 
[`ad77fe9`](https://github.com/apache/spark/commit/ad77fe9ad258eac224f069bbc89294818ee6b549).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #75000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75000/testReport)**
 for PR 17166 at commit 
[`6e8593b`](https://github.com/apache/spark/commit/6e8593b9bb88a2b0bf90e39887368cc4535480b6).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75000/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16596: [SPARK-19237][SPARKR][CORE] On Windows spark-subm...

2017-03-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16596


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17361
  
**[Test build #75003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75003/testReport)**
 for PR 17361 at commit 
[`d0758eb`](https://github.com/apache/spark/commit/d0758ebd6b78c6cde97e9750275a0fbba93da764).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

2017-03-21 Thread danielyli
Github user danielyli commented on the issue:

https://github.com/apache/spark/pull/15899
  
Hello,

I found this issue after encountering the error `'withFilter' method does 
not yet exist on RDD[(Int, Double)], using 'filter' method instead` in my code.

I'm writing a somewhat complicated `flatMap`-`flatMap`-`map` expression 
involving pair RDDs, and the code is becoming busy enough that sugaring them 
into a `for` expression is warranted for readability.  Since I'm not using any 
`filter`s or `if`s in the `for` expression, I found the above error message 
puzzling.  After some tinkering, I think I've found a minimal reproducible case:

```scala
for ((k, v) <- pairRdd) yield ...// pairRdd is of type RDD[(_, _)]
```

Curiously, the `withFilter` error doesn't occur if I write `for (x <- 
pairRdd) yield ...`.  @rxin, do you have any insight into this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17378: [SPARK-20046][SQL] Facilitate loop optimizations in a JI...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17378
  
**[Test build #74996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74996/testReport)**
 for PR 17378 at commit 
[`d74b6cf`](https://github.com/apache/spark/commit/d74b6cf5fb63479040e940e5797e0b226367b227).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17378: [SPARK-20046][SQL] Facilitate loop optimizations in a JI...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17378
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17378: [SPARK-20046][SQL] Facilitate loop optimizations in a JI...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17378
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74996/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9
  
**[Test build #75004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75004/consoleFull)**
 for PR 9 at commit 
[`6f169eb`](https://github.com/apache/spark/commit/6f169ebf8c0c832010d2dbd8f971cfabff7870f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-21 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107281316
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
+#' PFP distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param itemsCol Items column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_items", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(items = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, 
',') as items")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' itemsCol = "baskets", numPartitions = 
10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
+#' @note spark.fpGrowth since 2.2.0
+setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
+  function(data, minSupport = 0.3, minConfidence = 0.8,
+   itemsCol = "items", numPartitions = -1) {
--- End diff --

Correct if I am wrong but this cannot be done like this. If we want to 
default to `NULL` (I am not fond of this idea) we have to pass argument as a 
`character` / `String` and parse it once in JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-21 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107281460
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
+#' PFP distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param itemsCol Items column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_items", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(items = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, 
',') as items")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' itemsCol = "baskets", numPartitions = 
10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
--- End diff --

 I'll remove it completely and just link to the docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...

2017-03-21 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/17088#discussion_r107283229
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1331,7 +1328,20 @@ class DAGScheduler(
 
   // TODO: mark the executor as failed only if there were lots of 
fetch failures on it
   if (bmAddress != null) {
-handleExecutorLost(bmAddress.executorId, filesLost = true, 
Some(task.epoch))
+val hostToUnregisterOutputs = if 
(env.blockManager.externalShuffleServiceEnabled) {
+  // We had a fetch failure with the external shuffle service, 
so we
+  // assume all shuffle data on the node is bad.
+  Some(bmAddress.host)
+} else {
+  // Deregister shuffle data just for one executor (we don't 
have any
--- End diff --

nit: "Unregister" is used elsewhere (function names, etc.), not 
"deregister".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...

2017-03-21 Thread kunalkhamar
Github user kunalkhamar commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r107283640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -17,43 +17,70 @@
 
 package org.apache.spark.sql.internal
 
-import java.io.File
-
 import org.apache.hadoop.conf.Configuration
-import org.apache.hadoop.fs.Path
 
+import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.sql._
-import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.{Analyzer, FunctionRegistry}
 import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.optimizer.Optimizer
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.command.AnalyzeTableCommand
 import org.apache.spark.sql.execution.datasources._
-import org.apache.spark.sql.streaming.{StreamingQuery, 
StreamingQueryManager}
+import org.apache.spark.sql.streaming.StreamingQueryManager
 import org.apache.spark.sql.util.ExecutionListenerManager
 
 
 /**
  * A class that holds all session-specific state in a given 
[[SparkSession]].
+ * @param functionRegistry Internal catalog for managing functions 
registered by the user.
+ * @param catalog Internal catalog for managing table and database states.
+ * @param sqlParser Parser that extracts expressions, plans, table 
identifiers etc. from SQL texts.
+ * @param analyzer Logical query plan analyzer for resolving unresolved 
attributes and relations.
+ * @param streamingQueryManager Interface to start and stop
+ *  
[[org.apache.spark.sql.streaming.StreamingQuery]]s.
+ * @param queryExecutionCreator Lambda to create a [[QueryExecution]] from 
a [[LogicalPlan]]
--- End diff --

@rxin Removing the redundant comments in 
[SPARK-20048](https://github.com/apache/spark/pull/17379).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...

2017-03-21 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/17088#discussion_r107284085
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1389,8 +1423,7 @@ class DAGScheduler(
 clearCacheLocs()
   }
 } else {
-  logDebug("Additional executor lost message for " + execId +
-   "(epoch " + currentEpoch + ")")
+  logDebug("Additional executor lost message for %s (epoch 
%d)".format(execId, currentEpoch))
--- End diff --

nit: prefer string interpolation over `format`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...

2017-03-21 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/17088#discussion_r107284202
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1683,11 +1716,12 @@ private[scheduler] class 
DAGSchedulerEventProcessLoop(dagScheduler: DAGScheduler
   dagScheduler.handleExecutorAdded(execId, host)
 
 case ExecutorLost(execId, reason) =>
-  val filesLost = reason match {
-case SlaveLost(_, true) => true
+  val workerLost = reason match {
+case SlaveLost(_, true) =>
+  true
--- End diff --

nit: prefer it without the line break for something this simple


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17108#discussion_r107283840
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala 
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.linalg.{SQLDataTypes, Vector}
+import org.apache.spark.mllib.linalg.{Vectors => OldVectors}
+import org.apache.spark.mllib.stat.{Statistics => OldStatistics}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * API for statistical functions in MLlib, compatible with Dataframes and 
Datasets.
+ *
+ * The functions in this package generalize the functions in 
[[org.apache.spark.sql.Dataset.stat]]
+ * to spark.ml's Vector types.
+ */
+@Since("2.2.0")
+@Experimental
+object Correlations {
+
+  /**
+   * Compute the correlation matrix for the input RDD of Vectors using the 
specified method.
+   * Methods currently supported: `pearson` (default), `spearman`.
+   *
+   * @param dataset A dataset or a dataframe
+   * @param column The name of the column of vectors for which the 
correlation coefficient needs
+   *   to be computed. This must be a column of the dataset, 
and it must contain
+   *   Vector objects.
+   * @param method String specifying the method to use for computing 
correlation.
+   *   Supported: `pearson` (default), `spearman`
+   * @return A dataframe that contains the correlation matrix of the 
column of vectors. This
+   * dataframe contains a single row and a single column of name
+   * '$METHODNAME($COLUMN)'.
+   * @throws IllegalArgumentException if the column is not a valid column 
in the dataset, or if
+   *  the content of this column is not of 
type Vector.
+   *
+   *  Here is how to access the correlation coefficient:
+   *  {{{
+   *val data: Dataset[Vector] = ...
+   *val Row(coeff: Matrix) = Statistics.corr(data, "value").head
+   *// coeff now contains the Pearson correlation matrix.
+   *  }}}
+   *
+   * @note For Spearman, a rank correlation, we need to create an 
RDD[Double] for each column
+   * and sort it in order to retrieve the ranks and then join the columns 
back into an RDD[Vector],
+   * which is fairly costly. Cache the input RDD before calling corr with 
`method = "spearman"` to
+   * avoid recomputing the common lineage.
+   */
+  @Since("2.2.0")
+  def corr(dataset: Dataset[_], column: String, method: String): DataFrame 
= {
+val rdd = dataset.select(column).rdd.map {
+  case Row(v: Vector) => OldVectors.fromML(v)
+}
+val oldM = OldStatistics.corr(rdd, method)
+val name = s"$method($column)"
+val schema = StructType(Array(StructField(name, 
SQLDataTypes.MatrixType, nullable = true)))
+dataset.sparkSession.createDataFrame(Seq(Row(oldM.asML)).asJava, 
schema)
+  }
+
+  /**
+   * Compute the correlation matrix for the input Dataset of Vectors.
--- End diff --

Just say that this is a version of corr which defaults to "pearson" for the 
method.  Don't document params or return value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17108#discussion_r107074472
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala 
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.linalg.{SQLDataTypes, Vector}
+import org.apache.spark.mllib.linalg.{Vectors => OldVectors}
+import org.apache.spark.mllib.stat.{Statistics => OldStatistics}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * API for statistical functions in MLlib, compatible with Dataframes and 
Datasets.
--- End diff --

This should be limited to correlations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17108: [SPARK-19636][ML] Feature parity for correlation ...

2017-03-21 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17108#discussion_r107075473
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Correlations.scala 
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.linalg.{SQLDataTypes, Vector}
+import org.apache.spark.mllib.linalg.{Vectors => OldVectors}
+import org.apache.spark.mllib.stat.{Statistics => OldStatistics}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * API for statistical functions in MLlib, compatible with Dataframes and 
Datasets.
+ *
+ * The functions in this package generalize the functions in 
[[org.apache.spark.sql.Dataset.stat]]
+ * to spark.ml's Vector types.
+ */
+@Since("2.2.0")
+@Experimental
+object Correlations {
--- End diff --

How about calling it "Correlation" (singular)?  Especially if we add a 
builder pattern, then I feel like ```new Correlation().set...``` seems more 
natural.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >