[GitHub] spark issue #15935: [SPARK-18188] add checksum for blocks of broadcast

2016-11-29 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15935
  
LGTM. Merging to master and 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15935: [SPARK-18188] add checksum for blocks of broadcas...

2016-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15975#discussion_r89956606
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -76,9 +76,6 @@ class JDBCOptions(
 
   // the number of partitions
   val numPartitions = parameters.get(JDBC_NUM_PARTITIONS).map(_.toInt)
--- End diff --

Reading the table using a single partition. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15975#discussion_r89956695
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -209,6 +209,14 @@ class JDBCSuite extends SparkFunSuite
 conn.close()
   }
 
+  // Check whether the tables are fetched in the expected degree of 
parallelism
+  def checkNumPartitions(df: DataFrame, expectedNumPartitions: Int): Unit 
= {
+val explain = ExplainCommand(df.queryExecution.logical, extended = 
true)
+val plans = spark.sessionState.executePlan(explain).executedPlan
+val expectedMsg = 
s"${JDBCOptions.JDBC_NUM_PARTITIONS}=$expectedNumPartitions"
--- End diff --

Good idea!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15975#discussion_r89956633
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -667,13 +667,13 @@ object JdbcUtils extends Logging {
 val getConnection: () => Connection = createConnectionFactory(options)
 val batchSize = options.batchSize
 val isolationLevel = options.isolationLevel
-val numPartitions = options.numPartitions
-val repartitionedDF =
-  if (numPartitions.isDefined && numPartitions.get < 
df.rdd.getNumPartitions) {
-df.coalesce(numPartitions.get)
-  } else {
-df
-  }
+val repartitionedDF = options.numPartitions match {
+  case Some(n) if n <= 0 => throw new IllegalArgumentException(
--- End diff --

Yeah. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [WIP][SPARK-18516][SQL] Split state and progress in stre...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto pick" fe...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16052
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16017
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69306/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16048
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69305/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16017
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [WIP][SPARK-18516][SQL] Split state and progress in stre...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16048
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69309/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69307/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16045: [SPARK-18553][CORE] Fix leak of TaskSetManager following...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16045
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [WIP][SPARK-18516][SQL] Split state and progress in stre...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69308/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto pick" fe...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69303/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [WIP][SPARK-18516][SQL] Split state and progress in stre...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69295/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16045: [SPARK-18553][CORE] Fix leak of TaskSetManager following...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16045
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69304/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16055: [SPARK-17897] [SQL] Attribute is not NullIntolera...

2016-11-29 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/16055

[SPARK-17897] [SQL] Attribute is not NullIntolerant

### What changes were proposed in this pull request?
`Attribute` is not `NullIntolerant`. This PR is to fix it. 

Without the fix, the following test case will return empty.
```Scala
val data = Seq[java.lang.Integer](1, null).toDF("key")
data.filter("not key is not null").show()
```

### How was this patch tested?
Added a test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark isNotNull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16055.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16055


commit 33c10a0994c9802df901f211e1f28c52e34df27f
Author: gatorsmile 
Date:   2016-11-29T08:00:55Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant

2016-11-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16055
  
Can you explain how did nullintolerant impact the case?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16055
  
**[Test build #69310 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69310/consoleFull)**
 for PR 16055 at commit 
[`33c10a0`](https://github.com/apache/spark/commit/33c10a0994c9802df901f211e1f28c52e34df27f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69312/consoleFull)**
 for PR 16030 at commit 
[`43f028d`](https://github.com/apache/spark/commit/43f028d3b495a825cebff39daa12d6a2f25f0110).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16048
  
**[Test build #69311 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69311/consoleFull)**
 for PR 16048 at commit 
[`fa313e5`](https://github.com/apache/spark/commit/fa313e5fdd60783df6fb96403802d4d1558b8cca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16055
  
Sure, will update the PR description tomorrow. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-29 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16017
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16017
  
**[Test build #69313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69313/consoleFull)**
 for PR 16017 at commit 
[`30f5096`](https://github.com/apache/spark/commit/30f5096ce9dce89e3d3a3014bc53164cc2af2788).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-11-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16014
  
@shivaram Cool, I did know about release-build but I didn't know it's 
running on Jenkins. I *think* we should be ok but might want to check Jenkins 
has "e1071" and "survival" which are optional for compatibility tests but `R 
CMD check` is enforcing/requiring it. If you recall, 
[this](https://github.com/apache/spark/pull/15790#issuecomment-259780799) is 
this conversation that prompted this change.

@rxin This PR updates what goes into the Spark binary release to match what 
we (intend to) release on CRAN for the R package

As for the diff, this is the delta between this PR and Spark 2.0.2 under 
the R/lib/SparkR directory. It turns out `R CMD check` also depends on Rd file 
generation in install-dev.sh (ie. `devtools::document(pkg="./pkg", 
roclets=c("rd")) }`).. this is going to take more time to untangle this in a 
follow up.

_what's additional_
```
SparkR/
-rw-r--r--   INDEX
drwxr-xr-x   doc

SparkR/Meta/
-rw-r--r--   vignette.rds

SparkR/doc/
-rw-r--r--   sparkr-vignettes.Rmd
-rw-r--r--   sparkr-vignettes.R
-rw-r--r--   sparkr-vignettes.html
-rw-r--r--   index.html
```

_what's omitted_
```
SparkR/html/
-rw-r--r--  1 root root  1319 Nov 29 08:05 R.css
-rw-r--r--  1 root root 81153 Nov 29 08:05 00Index.html
```

What it used to have `year.html` `write.parquet.html` 
`sparkR.session.html`, the html directory now only has 2 files. My 
understanding is these knitr html output are actually not used at runtime. I 
checked that `?sparkR.session` in the `sparkR` is still working correctly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16028
  
**[Test build #69314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69314/consoleFull)**
 for PR 16028 at commit 
[`74cb363`](https://github.com/apache/spark/commit/74cb3639278fa525dfe5b75d11a7a8dcd06f04a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-29 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/16028
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16014
  
**[Test build #69315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69315/consoleFull)**
 for PR 16014 at commit 
[`c9c9802`](https://github.com/apache/spark/commit/c9c9802e67178e2283b1ebc9fa13f39db916773d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16061
  
@erikerlandson For the RAT failure, you may either add Apache license 
header to newly added files or add the file to `dev/.rat-excludes`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15972: [SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, Deve...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15972
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69338/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter met...

2016-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16017


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16064: [SPARK-18633][ML][Example]: Add multiclass logist...

2016-11-29 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/16064

[SPARK-18633][ML][Example]: Add multiclass logistic regression summary 
python example and document

## What changes were proposed in this pull request?
Logistic Regression summary is added in Python API. We need to add example 
and document for summary. 

The newly added example is consistent with Scala and Java examples.

## How was this patch tested?

Manually tests: Run the example with spark-submit; copy & paste code into 
pyspark; build document and check the document.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark py

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16064


commit 70400890ca83b91ea44b0d34bf53b753f07ba46b
Author: wm...@hotmail.com 
Date:   2016-11-29T19:51:55Z

add python example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16064: [SPARK-18633][ML][Example]: Add multiclass logistic regr...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16064
  
**[Test build #69344 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69344/consoleFull)**
 for PR 16064 at commit 
[`7040089`](https://github.com/apache/spark/commit/70400890ca83b91ea44b0d34bf53b753f07ba46b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15924: [SPARK-18498] [SQL] Revise HDFSMetadataLog API fo...

2016-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15924


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16065: [SPARK-18631][SQL] Changed ExchangeCoordinator re-partit...

2016-11-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16065
  
cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-29 Thread ericl
Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15998
  
* looks good once InMemoryCatalog is fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15954: [SPARK-18516][SQL] Split state and progress in st...

2016-11-29 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/15954#discussion_r90130320
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala ---
@@ -64,23 +68,26 @@ trait StreamingQuery {
 
   /**
* Returns the current status of the query.
+   *
* @since 2.0.2
*/
   def status: StreamingQueryStatus
 
   /**
-   * Returns current status of all the sources.
-   * @since 2.0.0
+   * Returns an array of the most recent [[StreamingQueryProgress]] 
updates for this query.
+   * The number of progress updates retained for each stream is configured 
by Spark session
+   * configuration `spark.sql.streaming.numRecentProgresses`.
+   *
+   * @since 2.1.0
*/
-  @deprecated("use status.sourceStatuses", "2.0.2")
-  def sourceStatuses: Array[SourceStatus]
+  def recentProgresses: Array[StreamingQueryProgress]
--- End diff --

Are these for the last `n` triggers? Or is it last `n` instantaneous 
progress updates, e.g. finished reading from a source etc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16048
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13557: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KM...

2016-11-29 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/13557#discussion_r90132591
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -316,6 +316,36 @@ def computeCost(self, dataset):
 """
 return self._call_java("computeCost", dataset)
 
+@property
+@since("2.1.0")
+def hasSummary(self):
+"""
+Indicates whether a training summary exists for this model 
instance.
+"""
+return self._call_java("hasSummary")
+
+@property
+@since("2.1.0")
+def summary(self):
+"""
+Gets summary (e.g. cluster assignments, cluster sizes) of the 
model trained on the
+training set. An exception is thrown if no summary exists.
+"""
+if self.hasSummary:
+return KMeansSummary(self._call_java("summary"))
+else:
+raise RuntimeError("No training summary available for this %s" 
%
+   self.__class__.__name__)
+
+
+class KMeansSummary(ClusteringSummary):
--- End diff --

Let's move it after the `KMeans` class like the others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16065: [SPARK-18631][SQL] Changed ExchangeCoordinator re-partit...

2016-11-29 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16065
  
Thanks @markhamstra Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69346/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16068: stateful udf should be nondeterministic

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16068
  
**[Test build #69361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69361/consoleFull)**
 for PR 16068 at commit 
[`a4d8b4a`](https://github.com/apache/spark/commit/a4d8b4af648e53c355bee16fe371137d0b349331).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16066: [SPARK-18632][SQL] AggregateFunction should not implemen...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16066
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16066: [SPARK-18632][SQL] AggregateFunction should not implemen...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16066
  
**[Test build #69349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69349/consoleFull)**
 for PR 16066 at commit 
[`9a722cf`](https://github.com/apache/spark/commit/9a722cf3d48850ab6579db856876bada8749330c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Average(child: Expression) extends DeclarativeAggregate 
with ImplicitCastInputTypes `
  * `abstract class CentralMomentAgg(child: Expression)`
  * `case class Corr(x: Expression, y: Expression)`
  * `abstract class Covariance(x: Expression, y: Expression)`
  * `case class First(child: Expression, ignoreNullsExpr: Expression)`
  * `case class Last(child: Expression, ignoreNullsExpr: Expression)`
  * `case class Sum(child: Expression) extends DeclarativeAggregate with 
ImplicitCastInputTypes `
  * `sealed abstract class AggregateFunction extends Expression `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16067
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16067
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69350/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16045: [SPARK-18553][CORE] Fix leak of TaskSetManager following...

2016-11-29 Thread JoshRosen
Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/16045
  
Cool, I'm going to merge this into master and branch-2.1 in that case. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15954
  
**[Test build #69355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69355/consoleFull)**
 for PR 15954 at commit 
[`69d9b4a`](https://github.com/apache/spark/commit/69d9b4a1de6c7f07bb2153b02d3ffabbb87eaac1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15255
  
**[Test build #69357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69357/consoleFull)**
 for PR 15255 at commit 
[`57817a1`](https://github.com/apache/spark/commit/57817a1c96c9577725ee8766834b20b06adfe521).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16065: [SPARK-18631][SQL] Changed ExchangeCoordinator re-partit...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16065
  
**[Test build #69347 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69347/consoleFull)**
 for PR 16065 at commit 
[`561fcf6`](https://github.com/apache/spark/commit/561fcf67bd3c1541352b00f33981a44fa58a6ccc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16048
  
**[Test build #69359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69359/consoleFull)**
 for PR 16048 at commit 
[`eaa4a73`](https://github.com/apache/spark/commit/eaa4a73b4c30a446e8144339de3eca71d0b5dfdf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15982: [SPARK-18546][core] Fix merging shuffle spills when usin...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15982
  
**[Test build #69360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69360/consoleFull)**
 for PR 15982 at commit 
[`1025c6b`](https://github.com/apache/spark/commit/1025c6bb384968a7fc474d35a1bb18d82eb21938).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16065: [SPARK-18631][SQL] Changed ExchangeCoordinator re...

2016-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16065


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14164: [SPARK-16629] Allow comparisons between UDTs and ...

2016-11-29 Thread damnMeddlingKid
Github user damnMeddlingKid closed the pull request at:

https://github.com/apache/spark/pull/14164


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13557: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of...

2016-11-29 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13557
  
Thanks for updating this @zjffdu, it looks good to me now that @sethah's 
comments have been addressed maybe we can get @davies or @MLnick to take a 
final pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15496: [SPARK-17950] [Python] Match SparseVector behavio...

2016-11-29 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15496#discussion_r90139924
  
--- Diff: python/pyspark/ml/linalg/__init__.py ---
@@ -705,6 +705,22 @@ def __eq__(self, other):
 return Vectors._equals(self.indices, self.values, 
list(xrange(len(other))), other.array)
 return False
 
+def __getattr__(self, item):
+def wrapper(*args, **kwargs):
+if _have_scipy:
+csr = scipy.sparse.csr_matrix((np.append(self.values, 0),
+   np.append(self.indices, 
self.size-1),
+   [0, len(self.values)]))
+func = getattr(csr, item)
+result = func(*args, **kwargs)
+if isinstance(result, scipy.sparse.csr.csr_matrix):
+return 
SparseVector(result.shape[1],result.indices,result.data)
+return result
+else:
+raise AttributeError(
+"'{0}' object has no attribute '{1}' or SciPy not 
installed.".format(self.__class__, item))
--- End diff --

Ok so maybe we can improve the error message to something like "'{0}' 
object has no attribute '{1}' and SciPy is not installed to proxy request to 
SparseVector" (or similar).

Because saying its X or Y is confusing since this error message only 
happens in the event SciPy is not installed.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15982: [SPARK-18546][core] Fix merging shuffle spills when usin...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15982
  
**[Test build #69348 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69348/consoleFull)**
 for PR 15982 at commit 
[`8ac9276`](https://github.com/apache/spark/commit/8ac927623c5d7809208b766001f46ea2ad576af9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #69362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69362/consoleFull)**
 for PR 14638 at commit 
[`5355de9`](https://github.com/apache/spark/commit/5355de90cfe71a83a1fd1a88bf4e7a7135fb5def).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15946: [SPARK-18513][Structured Streaming] Record and re...

2016-11-29 Thread lw-lin
Github user lw-lin closed the pull request at:

https://github.com/apache/spark/pull/15946


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15946: [SPARK-18513][Structured Streaming] Record and recover w...

2016-11-29 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/15946
  
sure! closing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15954: [SPARK-18516][SQL] Split state and progress in st...

2016-11-29 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15954#discussion_r90147041
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala ---
@@ -33,25 +35,27 @@ trait StreamingQuery {
* Returns the name of the query. This name is unique across all active 
queries. This can be
--- End diff --

yeah. it is. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15780
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15780
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69342/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16065: [SPARK-18631][SQL] Changed ExchangeCoordinator re-partit...

2016-11-29 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16065
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15861#discussion_r88075283
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapRedCommitProtocol.scala
 ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.internal.io
+
+import org.apache.hadoop.mapred._
+import org.apache.hadoop.mapreduce.{TaskAttemptContext => 
NewTaskAttemptContext}
+
+/**
+ * An [[FileCommitProtocol]] implementation backed by an underlying Hadoop 
OutputCommitter
+ * (from the old mapred API).
+ *
+ * Unlike Hadoop's OutputCommitter, this implementation is serializable.
+ */
+class HadoopMapRedCommitProtocol(jobId: String, path: String)
+  extends HadoopMapReduceCommitProtocol(jobId, path) {
+
+  override def setupCommitter(context: NewTaskAttemptContext): 
OutputCommitter = {
+val config = context.getConfiguration.asInstanceOf[JobConf]
+config.getOutputCommitter
--- End diff --

Do we need a setupJob on the committer here ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15861#discussion_r90122251
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.internal.io
+
+import java.text.NumberFormat
+import java.util.{Date, Locale}
+
+import scala.reflect.ClassTag
+
+import org.apache.hadoop.conf.{Configurable, Configuration}
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.mapred._
+import org.apache.hadoop.mapreduce.{JobContext => NewJobContext, 
OutputFormat => NewOutputFormat, RecordWriter => NewRecordWriter, 
TaskAttemptContext => NewTaskAttemptContext, TaskAttemptID => NewTaskAttemptID, 
TaskType}
+import org.apache.hadoop.mapreduce.task.{TaskAttemptContextImpl => 
NewTaskAttemptContextImpl}
+
+import org.apache.spark.{SerializableWritable, SparkException, TaskContext}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.executor.OutputMetrics
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.io.FileCommitProtocol.TaskCommitMessage
+import org.apache.spark.rdd.{HadoopRDD, RDD}
+import org.apache.spark.util.{SerializableConfiguration, 
SerializableJobConf, Utils}
+
+/**
+ * A helper object that saves an RDD using a Hadoop OutputFormat
+ * (from the old mapred API).
+ */
+private[spark]
+object SparkHadoopWriter extends Logging {
+  import SparkHadoopWriterUtils._
+
+  /**
+   * Basic work flow of this command is:
+   * 1. Driver side setup, prepare the data source and hadoop 
configuration for the write job to
+   *be issued.
+   * 2. Issues a write job consists of one or more executor side tasks, 
each of which writes all
+   *rows within an RDD partition.
+   * 3. If no exception is thrown in a task, commits that task, otherwise 
aborts that task;  If any
+   *exception is thrown during task commitment, also aborts that task.
+   * 4. If all tasks are committed, commit the job, otherwise aborts the 
job;  If any exception is
+   *thrown during job commitment, also aborts the job.
+   */
+  def write[K, V: ClassTag](
+  rdd: RDD[(K, V)],
+  config: HadoopWriteConfigUtil[K, V]): Unit = {
+// Extract context and configuration from RDD.
+val sparkContext = rdd.context
+val stageId = rdd.id
+val sparkConf = rdd.conf
+
+// Set up a job.
+val jobTrackerId = createJobTrackerID(new Date())
+val jobContext = config.createJobContext(jobTrackerId, stageId)
+config.initOutputFormat(jobContext)
+
+// Assert the output format/key/value class is set in JobConf.
+config.assertConf()
+
+if (isOutputSpecValidationEnabled(sparkConf)) {
+  // FileOutputFormat ignores the filesystem parameter
+  config.checkOutputSpecs(jobContext)
+}
+
+val committer = config.createCommitter(stageId)
+committer.setupJob(jobContext)
+
+// When speculation is on and output committer class name contains 
"Direct", we should warn
+// users that they may loss data if they are using a direct output 
committer.
+// There is an example in 
https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
+// result of using direct output committer with speculation enabled.
+if (isSpeculationEnabled(sparkConf) && committer.isDirectOutput) {
+  val warningMessage =
+s"$committer may be an output committer that writes data directly 
to " +
+  "the final location. Because speculation is enabled, this output 
committer may " +
+  "cause data loss (see the case in SPARK-10063). If possible, 
please use an output " +
+  "committer that does not have this behavior (e.g. 
FileOutputCommitter)."
+  logWarning(warningMessage)
+}
+
+// Try to write 

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15861#discussion_r90121527
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.internal.io
+
+import java.text.NumberFormat
+import java.util.{Date, Locale}
+
+import scala.reflect.ClassTag
+
+import org.apache.hadoop.conf.{Configurable, Configuration}
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.mapred._
+import org.apache.hadoop.mapreduce.{JobContext => NewJobContext, 
OutputFormat => NewOutputFormat, RecordWriter => NewRecordWriter, 
TaskAttemptContext => NewTaskAttemptContext, TaskAttemptID => NewTaskAttemptID, 
TaskType}
+import org.apache.hadoop.mapreduce.task.{TaskAttemptContextImpl => 
NewTaskAttemptContextImpl}
+
+import org.apache.spark.{SerializableWritable, SparkException, TaskContext}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.executor.OutputMetrics
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.io.FileCommitProtocol.TaskCommitMessage
+import org.apache.spark.rdd.{HadoopRDD, RDD}
+import org.apache.spark.util.{SerializableConfiguration, 
SerializableJobConf, Utils}
+
+/**
+ * A helper object that saves an RDD using a Hadoop OutputFormat
+ * (from the old mapred API).
+ */
+private[spark]
+object SparkHadoopWriter extends Logging {
+  import SparkHadoopWriterUtils._
+
+  /**
+   * Basic work flow of this command is:
+   * 1. Driver side setup, prepare the data source and hadoop 
configuration for the write job to
+   *be issued.
+   * 2. Issues a write job consists of one or more executor side tasks, 
each of which writes all
+   *rows within an RDD partition.
+   * 3. If no exception is thrown in a task, commits that task, otherwise 
aborts that task;  If any
+   *exception is thrown during task commitment, also aborts that task.
+   * 4. If all tasks are committed, commit the job, otherwise aborts the 
job;  If any exception is
+   *thrown during job commitment, also aborts the job.
+   */
+  def write[K, V: ClassTag](
+  rdd: RDD[(K, V)],
+  config: SparkHadoopWriterConfig[K, V]): Unit = {
+// Extract context and configuration from RDD.
+val sparkContext = rdd.context
+val stageId = rdd.id
+val sparkConf = rdd.conf
+
+// Set up a job.
+val jobTrackerId = createJobTrackerID(new Date())
+val jobContext = config.createJobContext(jobTrackerId, stageId)
+config.initOutputFormat(jobContext)
+
+// Assert the output format/key/value class is set in JobConf.
+config.assertConf()
+
+if (isOutputSpecValidationEnabled(sparkConf)) {
+  // FileOutputFormat ignores the filesystem parameter
+  config.checkOutputSpecs(jobContext)
+}
+
+val committer = config.createCommitter(stageId)
+committer.setupJob(jobContext)
+
+// When speculation is on and output committer class name contains 
"Direct", we should warn
+// users that they may loss data if they are using a direct output 
committer.
+// There is an example in 
https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
+// result of using direct output committer with speculation enabled.
+if (isSpeculationEnabled(sparkConf) && committer.isDirectOutput) {
+  val warningMessage =
+s"$committer may be an output committer that writes data directly 
to " +
+  "the final location. Because speculation is enabled, this output 
committer may " +
+  "cause data loss (see the case in SPARK-10063). If possible, 
please use an output " +
+  "committer that does not have this behavior (e.g. 
FileOutputCommitter)."
+  logWarning(warningMessage)
+}
+
+// Try to 

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15861#discussion_r90120144
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.internal.io
+
+import java.text.NumberFormat
+import java.util.{Date, Locale}
+
+import scala.reflect.ClassTag
+
+import org.apache.hadoop.conf.{Configurable, Configuration}
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.mapred._
+import org.apache.hadoop.mapreduce.{JobContext => NewJobContext, 
OutputFormat => NewOutputFormat, RecordWriter => NewRecordWriter, 
TaskAttemptContext => NewTaskAttemptContext, TaskAttemptID => NewTaskAttemptID, 
TaskType}
+import org.apache.hadoop.mapreduce.task.{TaskAttemptContextImpl => 
NewTaskAttemptContextImpl}
+
+import org.apache.spark.{SerializableWritable, SparkException, TaskContext}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.executor.OutputMetrics
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.io.FileCommitProtocol.TaskCommitMessage
+import org.apache.spark.rdd.{HadoopRDD, RDD}
+import org.apache.spark.util.{SerializableConfiguration, 
SerializableJobConf, Utils}
+
+/**
+ * A helper object that saves an RDD using a Hadoop OutputFormat
+ * (from the old mapred API).
+ */
+private[spark]
+object SparkHadoopWriter extends Logging {
+  import SparkHadoopWriterUtils._
+
+  /**
+   * Basic work flow of this command is:
+   * 1. Driver side setup, prepare the data source and hadoop 
configuration for the write job to
+   *be issued.
+   * 2. Issues a write job consists of one or more executor side tasks, 
each of which writes all
+   *rows within an RDD partition.
+   * 3. If no exception is thrown in a task, commits that task, otherwise 
aborts that task;  If any
+   *exception is thrown during task commitment, also aborts that task.
+   * 4. If all tasks are committed, commit the job, otherwise aborts the 
job;  If any exception is
+   *thrown during job commitment, also aborts the job.
+   */
+  def write[K, V: ClassTag](
+  rdd: RDD[(K, V)],
+  config: HadoopWriteConfigUtil[K, V]): Unit = {
+// Extract context and configuration from RDD.
+val sparkContext = rdd.context
+val stageId = rdd.id
+val sparkConf = rdd.conf
+
+// Set up a job.
+val jobTrackerId = createJobTrackerID(new Date())
+val jobContext = config.createJobContext(jobTrackerId, stageId)
+config.initOutputFormat(jobContext)
+
+// Assert the output format/key/value class is set in JobConf.
+config.assertConf()
+
+if (isOutputSpecValidationEnabled(sparkConf)) {
+  // FileOutputFormat ignores the filesystem parameter
+  config.checkOutputSpecs(jobContext)
+}
+
+val committer = config.createCommitter(stageId)
+committer.setupJob(jobContext)
+
+// When speculation is on and output committer class name contains 
"Direct", we should warn
+// users that they may loss data if they are using a direct output 
committer.
+// There is an example in 
https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
+// result of using direct output committer with speculation enabled.
+if (isSpeculationEnabled(sparkConf) && committer.isDirectOutput) {
+  val warningMessage =
+s"$committer may be an output committer that writes data directly 
to " +
+  "the final location. Because speculation is enabled, this output 
committer may " +
+  "cause data loss (see the case in SPARK-10063). If possible, 
please use an output " +
+  "committer that does not have this behavior (e.g. 
FileOutputCommitter)."
+  logWarning(warningMessage)
+}
+
+// Try to write 

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15861#discussion_r90127987
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.internal.io
+
+import java.text.NumberFormat
+import java.util.{Date, Locale}
+
+import scala.reflect.ClassTag
+
+import org.apache.hadoop.conf.{Configurable, Configuration}
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.mapred._
+import org.apache.hadoop.mapreduce.{JobContext => NewJobContext, 
OutputFormat => NewOutputFormat, RecordWriter => NewRecordWriter, 
TaskAttemptContext => NewTaskAttemptContext, TaskAttemptID => NewTaskAttemptID, 
TaskType}
+import org.apache.hadoop.mapreduce.task.{TaskAttemptContextImpl => 
NewTaskAttemptContextImpl}
+
+import org.apache.spark.{SerializableWritable, SparkException, TaskContext}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.executor.OutputMetrics
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.io.FileCommitProtocol.TaskCommitMessage
+import org.apache.spark.rdd.{HadoopRDD, RDD}
+import org.apache.spark.util.{SerializableConfiguration, 
SerializableJobConf, Utils}
+
+/**
+ * A helper object that saves an RDD using a Hadoop OutputFormat
+ * (from the old mapred API).
+ */
+private[spark]
+object SparkHadoopWriter extends Logging {
+  import SparkHadoopWriterUtils._
+
+  /**
+   * Basic work flow of this command is:
+   * 1. Driver side setup, prepare the data source and hadoop 
configuration for the write job to
+   *be issued.
+   * 2. Issues a write job consists of one or more executor side tasks, 
each of which writes all
+   *rows within an RDD partition.
+   * 3. If no exception is thrown in a task, commits that task, otherwise 
aborts that task;  If any
+   *exception is thrown during task commitment, also aborts that task.
+   * 4. If all tasks are committed, commit the job, otherwise aborts the 
job;  If any exception is
+   *thrown during job commitment, also aborts the job.
+   */
+  def write[K, V: ClassTag](
+  rdd: RDD[(K, V)],
+  config: HadoopWriteConfigUtil[K, V]): Unit = {
+// Extract context and configuration from RDD.
+val sparkContext = rdd.context
+val stageId = rdd.id
+val sparkConf = rdd.conf
+
+// Set up a job.
+val jobTrackerId = createJobTrackerID(new Date())
+val jobContext = config.createJobContext(jobTrackerId, stageId)
+config.initOutputFormat(jobContext)
+
+// Assert the output format/key/value class is set in JobConf.
+config.assertConf()
+
+if (isOutputSpecValidationEnabled(sparkConf)) {
+  // FileOutputFormat ignores the filesystem parameter
+  config.checkOutputSpecs(jobContext)
+}
+
+val committer = config.createCommitter(stageId)
+committer.setupJob(jobContext)
+
+// When speculation is on and output committer class name contains 
"Direct", we should warn
+// users that they may loss data if they are using a direct output 
committer.
+// There is an example in 
https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
+// result of using direct output committer with speculation enabled.
+if (isSpeculationEnabled(sparkConf) && committer.isDirectOutput) {
+  val warningMessage =
+s"$committer may be an output committer that writes data directly 
to " +
+  "the final location. Because speculation is enabled, this output 
committer may " +
+  "cause data loss (see the case in SPARK-10063). If possible, 
please use an output " +
+  "committer that does not have this behavior (e.g. 
FileOutputCommitter)."
+  logWarning(warningMessage)
+}
+
+// Try to write 

[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16067
  
**[Test build #69354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69354/consoleFull)**
 for PR 16067 at commit 
[`f693040`](https://github.com/apache/spark/commit/f693040d8bd1bfcf7ddeda7a6eabfce1de08c62a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15861#discussion_r90129155
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -1089,66 +1064,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
* MapReduce job.
*/
   def saveAsHadoopDataset(conf: JobConf): Unit = self.withScope {
-// Rename this as hadoopConf internally to avoid shadowing (see 
SPARK-2038).
-val hadoopConf = conf
-val outputFormatInstance = hadoopConf.getOutputFormat
-val keyClass = hadoopConf.getOutputKeyClass
-val valueClass = hadoopConf.getOutputValueClass
-if (outputFormatInstance == null) {
-  throw new SparkException("Output format class not set")
-}
-if (keyClass == null) {
-  throw new SparkException("Output key class not set")
-}
-if (valueClass == null) {
-  throw new SparkException("Output value class not set")
-}
-SparkHadoopUtil.get.addCredentials(hadoopConf)
-
-logDebug("Saving as hadoop file of type (" + keyClass.getSimpleName + 
", " +
-  valueClass.getSimpleName + ")")
-
-if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(self.conf)) {
-  // FileOutputFormat ignores the filesystem parameter
-  val ignoredFs = FileSystem.get(hadoopConf)
-  hadoopConf.getOutputFormat.checkOutputSpecs(ignoredFs, hadoopConf)
-}
--- End diff --

These validations should go into HadoopMapReduceWriteConfigUtil


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15861#discussion_r90116879
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -1016,11 +1013,6 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   /**
* Output the RDD to any Hadoop-supported file system, using a Hadoop 
`OutputFormat` class
* supporting the key and value types K and V in this RDD.
-   *
-   * @note We should make sure our tasks are idempotent when speculation 
is enabled, i.e. do
-   * not use output committer that writes data directly.
-   * There is an example in 
https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
-   * result of using direct output committer with speculation enabled.
--- End diff --

Why was this removed ? It is still relevant now even if checked in a 
different method invoked from here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15954
  
**[Test build #69353 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69353/consoleFull)**
 for PR 15954 at commit 
[`c11d2e5`](https://github.com/apache/spark/commit/c11d2e51dd1bbbcededeb48db83dd8e060f9c0ae).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69353/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16066: [SPARK-18632][SQL] AggregateFunction should not implemen...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16066
  
**[Test build #69356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69356/consoleFull)**
 for PR 16066 at commit 
[`1246792`](https://github.com/apache/spark/commit/1246792cdcf96a4eb1ecfa158aaf6861269735a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15982#discussion_r90133223
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java ---
@@ -337,42 +340,47 @@ void forceSorterToSpill() throws IOException {
 final int numPartitions = partitioner.numPartitions();
 final long[] partitionLengths = new long[numPartitions];
 final InputStream[] spillInputStreams = new 
FileInputStream[spills.length];
-OutputStream mergedFileOutputStream = null;
+
+// Use a counting output stream to avoid having to close the 
underlying file and ask
+// the file system for its size after each partition is written.
+final CountingOutputStream mergedFileOutputStream = new 
CountingOutputStream(
+  new FileOutputStream(outputFile));
 
 boolean threwException = true;
 try {
   for (int i = 0; i < spills.length; i++) {
 spillInputStreams[i] = new FileInputStream(spills[i].file);
   }
   for (int partition = 0; partition < numPartitions; partition++) {
-final long initialFileLength = outputFile.length();
-mergedFileOutputStream =
-  new TimeTrackingOutputStream(writeMetrics, new 
FileOutputStream(outputFile, true));
+final long initialFileLength = 
mergedFileOutputStream.getByteCount();
+// Shield the underlying output stream from close() calls, so that 
we can close the higher
+// level streams to make sure all data is really flushed and 
internal state is cleaned.
+OutputStream partitionOutput = new 
CloseShieldOutputStream(mergedFileOutputStream);
+partitionOutput = 
blockManager.serializerManager().wrapForEncryption(partitionOutput);
 if (compressionCodec != null) {
-  mergedFileOutputStream = 
compressionCodec.compressedOutputStream(mergedFileOutputStream);
+  partitionOutput = 
compressionCodec.compressedOutputStream(partitionOutput);
 }
-
+partitionOutput = new TimeTrackingOutputStream(writeMetrics, 
partitionOutput);
--- End diff --

Hmm... let me revert this and open a bug. `DiskBlockObjectWriter` doesn't 
count the time for compression / encryption, so this should behave the same. 
Both should be fixed together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16066: [SPARK-18632][SQL] AggregateFunction should not implemen...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16066
  
**[Test build #69358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69358/consoleFull)**
 for PR 16066 at commit 
[`3ba68de`](https://github.com/apache/spark/commit/3ba68deb0fbdb3885bb13ff20e49a055320a4588).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15954: [SPARK-18516][SQL] Split state and progress in st...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/15954#discussion_r90133572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
 ---
@@ -59,13 +62,20 @@ class StreamingQueryManager private[sql] (sparkSession: 
SparkSession) {
   /**
* Returns the query if there is an active query with the given id, or 
null.
*
-   * @since 2.0.0
+   * @since 2.1.0
*/
-  def get(id: Long): StreamingQuery = activeQueriesLock.synchronized {
+  def get(id: UUID): StreamingQuery = activeQueriesLock.synchronized {
 activeQueries.get(id).orNull
   }
 
   /**
+   * Returns the query if there is an active query with the given id, or 
null.
+   *
+   * @since 2.1.0
+   */
+  def get(id: String): StreamingQuery = get(UUID.fromString(id))
--- End diff --

I think thats okay.  A globally unique ID is a better identifier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16044: [Spark-18614][SQL] Incorrect predicate pushdown from Exi...

2016-11-29 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16044
  
LGTM - merging to master/2.1. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15954
  
**[Test build #69346 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69346/consoleFull)**
 for PR 15954 at commit 
[`d9d8f82`](https://github.com/apache/spark/commit/d9d8f82e0adfb23223e6d445f0f832824b08cf9b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16068: stateful udf should be nondeterministic

2016-11-29 Thread zhzhan
GitHub user zhzhan opened a pull request:

https://github.com/apache/spark/pull/16068

stateful udf should be nondeterministic

## What changes were proposed in this pull request?

Make stateful udf as nondeterministic

## How was this patch tested?

Mainly relies on existing queries. We also manually check the queries with 
stateful udf in the filter. Without the patch, the udf is mistakenly pushdown 
for efficiency. After the patch, the physical plan is generated correctly.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhzhan/spark state

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16068.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16068


commit a4d8b4af648e53c355bee16fe371137d0b349331
Author: Zhan Zhang 
Date:   2016-11-29T23:32:45Z

stateful udf should be nondeterministic




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15982: [SPARK-18546][core] Fix merging shuffle spills when usin...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15982
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15982: [SPARK-18546][core] Fix merging shuffle spills when usin...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15982
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69348/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16067
  
**[Test build #69350 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69350/consoleFull)**
 for PR 16067 at commit 
[`0722ae5`](https://github.com/apache/spark/commit/0722ae52d4b4031b4ff2751d22c787b070547fa0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16063: [SPARK-18622][SQL] Remove TypeCoercion rules for Average...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16063
  
**[Test build #69363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69363/consoleFull)**
 for PR 16063 at commit 
[`027c31f`](https://github.com/apache/spark/commit/027c31f87c58013a9147b49da5c1fc177b2fb034).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16066: [SPARK-18632][SQL] AggregateFunction should not implemen...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16066
  
**[Test build #69358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69358/consoleFull)**
 for PR 16066 at commit 
[`3ba68de`](https://github.com/apache/spark/commit/3ba68deb0fbdb3885bb13ff20e49a055320a4588).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15954: [SPARK-18516][SQL] Split state and progress in streaming

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15954
  
**[Test build #69353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69353/consoleFull)**
 for PR 15954 at commit 
[`c11d2e5`](https://github.com/apache/spark/commit/c11d2e51dd1bbbcededeb48db83dd8e060f9c0ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15982#discussion_r90127514
  
--- Diff: 
core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java
 ---
@@ -75,13 +75,6 @@
   @Mock(answer = RETURNS_SMART_NULLS) BlockManager blockManager;
   @Mock(answer = RETURNS_SMART_NULLS) DiskBlockManager diskBlockManager;
 
-  private static final class WrapStream extends 
AbstractFunction1 {
--- End diff --

you can eliminate the imports of `AbstractFunction1` and `OutputStream` 
after this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15982#discussion_r90121766
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java ---
@@ -337,42 +340,47 @@ void forceSorterToSpill() throws IOException {
 final int numPartitions = partitioner.numPartitions();
 final long[] partitionLengths = new long[numPartitions];
 final InputStream[] spillInputStreams = new 
FileInputStream[spills.length];
-OutputStream mergedFileOutputStream = null;
+
+// Use a counting output stream to avoid having to close the 
underlying file and ask
+// the file system for its size after each partition is written.
+final CountingOutputStream mergedFileOutputStream = new 
CountingOutputStream(
+  new FileOutputStream(outputFile));
 
 boolean threwException = true;
 try {
   for (int i = 0; i < spills.length; i++) {
 spillInputStreams[i] = new FileInputStream(spills[i].file);
   }
   for (int partition = 0; partition < numPartitions; partition++) {
-final long initialFileLength = outputFile.length();
-mergedFileOutputStream =
-  new TimeTrackingOutputStream(writeMetrics, new 
FileOutputStream(outputFile, true));
+final long initialFileLength = 
mergedFileOutputStream.getByteCount();
+// Shield the underlying output stream from close() calls, so that 
we can close the higher
+// level streams to make sure all data is really flushed and 
internal state is cleaned.
+OutputStream partitionOutput = new 
CloseShieldOutputStream(mergedFileOutputStream);
+partitionOutput = 
blockManager.serializerManager().wrapForEncryption(partitionOutput);
 if (compressionCodec != null) {
-  mergedFileOutputStream = 
compressionCodec.compressedOutputStream(mergedFileOutputStream);
+  partitionOutput = 
compressionCodec.compressedOutputStream(partitionOutput);
 }
-
+partitionOutput = new TimeTrackingOutputStream(writeMetrics, 
partitionOutput);
--- End diff --

another change here is that `TimeTrackingOutputStream` now goes around the 
compression codec.  I think that is the right change, but its at least worth 
mentioning in the commit msg.

I'm wondering if this its worth having a separate jira for this, just since 
it will effect metrics for all users


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15982#discussion_r90127615
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
 ---
@@ -86,14 +88,7 @@ public int compare(
 
   protected boolean shouldUseRadixSort() { return false; }
 
-  private final long pageSizeBytes = new 
SparkConf().getSizeAsBytes("spark.buffer.pageSize", "4m");
-
-  private static final class WrapStream extends 
AbstractFunction1 {
--- End diff --

same on trimming imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15982#discussion_r90126726
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -40,9 +41,11 @@
 import org.mockito.stubbing.Answer;
 
 import org.apache.spark.HashPartitioner;
+import org.apache.spark.SecurityManager;
 import org.apache.spark.ShuffleDependency;
 import org.apache.spark.SparkConf;
 import org.apache.spark.TaskContext;
+import org.apache.spark.deploy.SparkHadoopUtil;
--- End diff --

other than `CryptoStreamUtils`, the other added imports look unused.  Also 
looks like you can eliminate `AbstractFunction1` and `ByteStreams` since you 
are no longer using them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >