date:20150203

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4066#discussion_r24039211
  
--- Diff: core/src/main/scala/org/apache/spark/CommitDeniedException.scala 
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
--- End diff --

super-nit: since this exception is really only thrown by executors, perhaps 
put it in the `executor` package? To avoid polluting the `spark` package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5155] [PySpark] [Streaming] Mqtt stream...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4229#issuecomment-72733949
  
  [Test build #26668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26668/consoleFull)
 for   PR 4229 at commit 
[`3810c7d`](https://github.com/apache/spark/commit/3810c7dace8f8bdf531da958c54e0a302cb3e199).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MQTTUtils(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5182] [SPARK-5528] [SPARK-5509] [SPARK-...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4308#issuecomment-72734625
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26667/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5548: Fixed a race condition in AkkaUtil...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4343#issuecomment-72735757
  
  [Test build #26675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26675/consoleFull)
 for   PR 4343 at commit 
[`b9ba47e`](https://github.com/apache/spark/commit/b9ba47e635cb31d3a66c7417cb0048edd667ed42).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72619816
  
  [Test build #26635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26635/consoleFull)
 for   PR 4331 at commit 
[`3ab2661`](https://github.com/apache/spark/commit/3ab26614b5278edce6e8571e5c51fe0b67e3124e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Dsl(object):`
  * `class ExamplePointUDT(UserDefinedType):`
  * `class SQLTests(ReusedPySparkTestCase):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72619824
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26635/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72620154
  
  [Test build #575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/575/consoleFull)
 for   PR 4331 at commit 
[`35ccb9f`](https://github.com/apache/spark/commit/35ccb9f5721266a3a25df7e5f6d4b2c98f5f18d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-72622614
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26638/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5555] Enable UISeleniumSuite tests

2015-02-03 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/4334

[SPARK-] Enable UISeleniumSuite tests

This patch enables UISeleniumSuite, a set of tests for the Spark 
application web UI.  These tests were previously disabled because they were 
slow, but I think we now have sufficient test time budget that the benefit of 
enabling them outweighs the time costs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark enable-uiseleniumsuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4334


commit a5ab595476dc1d9addee4574d3f1bb8d9ff7ee79
Author: Josh Rosen joshro...@databricks.com
Date:   2015-02-03T09:22:40Z

Enable UISeleniumSuite tests.

commit 71efc72bd3958be7f800fa2d07802805de2da828
Author: Josh Rosen joshro...@databricks.com
Date:   2015-02-03T09:52:27Z

Update broken UISeleniumSuite tests; use random port #.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-72622599
  
  [Test build #26638 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26638/consoleFull)
 for   PR 4068 at commit 
[`1b8f924`](https://github.com/apache/spark/commit/1b8f9242b323d14010b2cfa743b23fab82177bee).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedGetField(child: Expression, fieldName: String) 
extends UnaryExpression `
  * `case class GetField(child: Expression, field: StructField, ordinal: 
Int) extends UnaryExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5555] Enable UISeleniumSuite tests

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4334#issuecomment-72633082
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26650/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72631120
  
  [Test build #26649 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26649/consoleFull)
 for   PR 4331 at commit 
[`35ccb9f`](https://github.com/apache/spark/commit/35ccb9f5721266a3a25df7e5f6d4b2c98f5f18d5).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72631133
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26649/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...

2015-02-03 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3642#issuecomment-72631442
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3642#issuecomment-72632170
  
  [Test build #26651 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26651/consoleFull)
 for   PR 3642 at commit 
[`0b9017f`](https://github.com/apache/spark/commit/0b9017fef57e5512d539146fafd9aa1e12b966ae).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72632209
  
  [Test build #575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/575/consoleFull)
 for   PR 4331 at commit 
[`35ccb9f`](https://github.com/apache/spark/commit/35ccb9f5721266a3a25df7e5f6d4b2c98f5f18d5).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...

2015-02-03 Thread WangTaoTheTonic

GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/spark/pull/4335

[SPARK-2945][YARN][Doc]add doc for spark.executor.instances

https://issues.apache.org/jira/browse/SPARK-2945

`spark.executor.instances` works. As this JIRA recommended, we should add 
docs for it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/spark SPARK-2945

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4335


commit 46c40c4ddc72b45d6f070b54e5e8e85b68ee0add
Author: WangTaoTheTonic barneystin...@aliyun.com
Date:   2015-02-03T09:28:56Z

add doc for spark.executor.instances




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5555] Enable UISeleniumSuite tests

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4334#issuecomment-72633068
  
  [Test build #26650 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26650/consoleFull)
 for   PR 4334 at commit 
[`71efc72`](https://github.com/apache/spark/commit/71efc72bd3958be7f800fa2d07802805de2da828).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72633362
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26645/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4335#issuecomment-72633373
  
  [Test build #26652 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26652/consoleFull)
 for   PR 4335 at commit 
[`46c40c4`](https://github.com/apache/spark/commit/46c40c4ddc72b45d6f070b54e5e8e85b68ee0add).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72633350
  
**[Test build #26645 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26645/consoleFull)**
 for PR 4066 at commit 
[`c79df98`](https://github.com/apache/spark/commit/c79df9821d91139019fda6f943e764a01d91c7c3)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4325#issuecomment-72625660
  
  [Test build #26642 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26642/consoleFull)
 for   PR 4325 at commit 
[`096e20d`](https://github.com/apache/spark/commit/096e20d5de068157910372a03a6face9edc829e6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4325#issuecomment-72625668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26642/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] DataFrame API update

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4332#issuecomment-72626851
  
  [Test build #26644 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26644/consoleFull)
 for   PR 4332 at commit 
[`ab0aa69`](https://github.com/apache/spark/commit/ab0aa69d2df6ba40359953e32883505ddc309e4f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] DataFrame API update

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4332#issuecomment-72626858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26644/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72627568
  
**[Test build #26637 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26637/consoleFull)**
 for PR 4066 at commit 
[`92e6dc9`](https://github.com/apache/spark/commit/92e6dc96530351b54cb8eb9944d90b7664776a79)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72627578
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26637/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Minor: Fix TaskContext deprecated annotations.

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4333#issuecomment-72627902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26648/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Minor: Fix TaskContext deprecated annotations.

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4333#issuecomment-72627893
  
  [Test build #26648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26648/consoleFull)
 for   PR 4333 at commit 
[`61c44ee`](https://github.com/apache/spark/commit/61c44ee843f8b3c94a094b63275ae1d37c870b64).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Hiveconsole] Bring hive console code up ...

2015-02-03 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4330#issuecomment-72628043
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for DataFrame Implementati...

2015-02-03 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4339#issuecomment-72762303
  
Hey @chenghao-intel thanks for doing this. I'm making changes to this file 
right now so I will roll your change in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5574] use given name prefix in dir

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4344#issuecomment-72762127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26681/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5574] use given name prefix in dir

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4344#issuecomment-72762121
  
  [Test build #26681 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26681/consoleFull)
 for   PR 4344 at commit 
[`33a84fe`](https://github.com/apache/spark/commit/33a84fe22bb53577450c918b2db4ae7150cc4ab8).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SimpleFunctionRegistry(val caseSensitive: Boolean) extends 
FunctionRegistry `
  * `class StringKeyHashMap[T](normalizer: (String) = String) `
  * `case class MultiAlias(child: Expression, names: Seq[String])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72762489
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26682/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72762480
  
  [Test build #26682 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26682/consoleFull)
 for   PR 4066 at commit 
[`459310a`](https://github.com/apache/spark/commit/459310af0ff6543daa5c63c12faa76c1beeda109).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskCommitDenied(`
  * `class CommitDeniedException(`
  * `  class OutputCommitCoordinatorActor(outputCommitCoordinator: 
OutputCommitCoordinator)`
  * `class GaussianMixtureModel(object):`
  * `class GaussianMixture(object):`
  * `class MultivariateGaussian(namedtuple('MultivariateGaussian', ['mu', 
'sigma'])):`
  * `class KafkaUtils(object):`
  * `public class JDBCUtils `
  * `trait Column extends DataFrame with ExpressionApi `
  * `class ColumnName(name: String) extends IncomputableColumn(name) `
  * `trait DataFrame extends DataFrameSpecificApi with RDDApi[Row] `
  * `class GroupedDataFrame protected[sql](df: DataFrameImpl, 
groupingExprs: Seq[Expression])`
  * `  protected[sql] class QueryExecution(val logical: LogicalPlan) `
  * `  logWarning(sCouldn't find class $driver, e);`
  * `  implicit class JDBCDataFrame(rdd: DataFrame) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][DataFrame] defineUDF.

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4345#issuecomment-72762449
  
  [Test build #26690 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26690/consoleFull)
 for   PR 4345 at commit 
[`639c0f8`](https://github.com/apache/spark/commit/639c0f8663c942b4f610e8256d6bb3bead20fbde).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72763376
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26679/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72763371
  
  [Test build #26679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26679/consoleFull)
 for   PR 4331 at commit 
[`dd9919f`](https://github.com/apache/spark/commit/dd9919f115d3b8f4b66d213c4a57bc832ed8ed57).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Dsl(object):`
  * `class ExamplePointUDT(UserDefinedType):`
  * `class SQLTests(ReusedPySparkTestCase):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72763611
  
Thanks. Merging in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24052353
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -17,14 +17,17 @@
 
 package org.apache.spark.mllib.classification
 
+import org.apache.spark.SparkContext
 import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.classification.impl.GLMClassificationModel
 import org.apache.spark.mllib.linalg.BLAS.dot
 import org.apache.spark.mllib.linalg.{DenseVector, Vector}
 import org.apache.spark.mllib.optimization._
 import org.apache.spark.mllib.regression._
-import org.apache.spark.mllib.util.{DataValidators, MLUtils}
+import org.apache.spark.mllib.util.{DataValidators, Exportable, Importable}
--- End diff --

About the names, we have `Exportable` with `save()` and `Importable` with 
`load()`. I have two questions:

1) Would `Saveable` and `save` be a better match? I'm looking at the search 
results from grepcode:

* `Saveable` as an interface: 
http://grepcode.com/search/?start=0query=Saveableentity=typek=i
* `Exportable` as an interface: 
http://grepcode.com/search?query=Exportablestart=0entity=typen=k=i

People use both but I didn't see a combination of `Exportable` and 
`save()`. I hope `savable` or `saveable` is still a valid word. Same applies to 
`Exportable`/`Loadable` and `load()`.

2) If an instance is `Importable`, it means we can import it from 
somewhere. This is not true for `object Model`. We don't import `object Model` 
but we use `object Model` to import `class Model`. Should the interface be 
called `Importer`/`Loader` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24052532
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -65,6 +67,70 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
   }
+
+  override def save(sc: SparkContext, path: String): Unit = {
+val sqlContext = new SQLContext(sc)
+import sqlContext._
+
+// Create JSON metadata.
+val metadataRDD =
+  sc.parallelize(Seq((this.getClass.getName, 
formatVersion))).toDataFrame(class, version)
+metadataRDD.toJSON.repartition(1).saveAsTextFile(path + /metadata)
+
+// Create Parquet data.
+val data = NaiveBayesModel.Data(labels, pi, theta)
+val dataRDD: DataFrame = sc.parallelize(Seq(data))
+dataRDD.repartition(1).saveAsParquetFile(path + /data)
+  }
+
+  override protected def formatVersion: String = 
NaiveBayesModel.formatVersion
+
+}
+
+object NaiveBayesModel extends Importable[NaiveBayesModel] {
+
+  /** Model data for model import/export */
+  private case class Data(labels: Array[Double], pi: Array[Double], theta: 
Array[Array[Double]])
--- End diff --

This `Data` class should live inside `ImporterV1`. If `ImporterV2` uses a 
different format, it is hard to update `Data` if it is global to both importers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72763987
  
Also, this could probably benefit from a more comprehensive unit test of 
the commit coordinator itself, since I think it's possible to alter the current 
implementation in ways that will introduce bugs that won't be caught by the 
current test (e.g. by swapping the id types).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-02-03 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4215#issuecomment-72763888
  
LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5460][MLlib] Wrapped `Try` around `dele...

2015-02-03 Thread x1-

GitHub user x1- opened a pull request:

https://github.com/apache/spark/pull/4347

[SPARK-5460][MLlib] Wrapped `Try` around `deleteAllCheckpoints` - 
RandomForest.

Because `deleteAllCheckpoints` has IOException potential.
fix issue.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/x1-/spark SPARK-5460

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4347


commit 15725763b838a14bbb291ad6fc87e42fec52fb95
Author: x1- viva...@gmail.com
Date:   2015-02-03T10:39:30Z

Wrapped `Try` around `deleteAllCheckpoints` - RandomForest.

Because `deleteAllCheckpoints` has IOException potential.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5460][MLlib] Wrapped `Try` around `dele...

2015-02-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4347#discussion_r24053119
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -244,7 +245,10 @@ private class RandomForest (
 
 // Delete any remaining checkpoints used for node Id cache.
 if (nodeIdCache.nonEmpty) {
-  nodeIdCache.get.deleteAllCheckpoints()
+  Try(nodeIdCache.get.deleteAllCheckpoints()) match {
+case Failure(e) = logWarning(sdelete all chackpoints faild. 
Error reason: ${e.getMessage})
--- End diff --

This has a typo (faild). Are you sure you want to continue in this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for DataFrame Implementati...

2015-02-03 Thread chenghao-intel

Github user chenghao-intel closed the pull request at:

https://github.com/apache/spark/pull/4339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for DataFrame Implementati...

2015-02-03 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4339#issuecomment-72765655
  
OK, I am closing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...

2015-02-03 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4336#discussion_r24053791
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala 
---
@@ -260,11 +260,11 @@ private[sql] class DataFrameImpl protected[sql](
 
   override def take(n: Int): Array[Row] = head(n)
 
-  override def collect(): Array[Row] = 
queryExecution.executedPlan.executeCollect()
+  override def collect(): Array[Row] = rdd.collect()
 
   override def collectAsList(): java.util.List[Row] = 
java.util.Arrays.asList(rdd.collect() :_*)
 
-  override def count(): Long = 
groupBy().count().rdd.collect().head.getLong(0)
+  override def count(): Long = rdd.count()
--- End diff --

Oh? If I understand correctly, I think the rdd.count() is the most 
optimized (partial aggregation is done in before shuffling). @rxin , can you 
confirm that? Sorry If I am wrong.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date

2015-02-03 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4307#issuecomment-72767656
  
BTW the pull request no longer merges cleanly with the master branch. Would 
be great to update. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date

2015-02-03 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4307#issuecomment-72767614
  
Can you add a test case for this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24054409
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
 ---
@@ -425,25 +427,25 @@ class LogisticRegressionSuite extends FunSuite with 
MLlibTestSparkContext with M
  * weights. The mathematical discussion and proof can be found here:
  * http://en.wikipedia.org/wiki/Multinomial_logistic_regression
  *
- *weights1 = weights$`1` - weights$`0`
- *weights2 = weights$`2` - weights$`0`
+ * weights1 = weights$`1` - weights$`0`
+ * weights2 = weights$`2` - weights$`0`
  *
- * weights1
- *5 x 1 sparse Matrix of class dgCMatrix
- *s0
- * 2.6228269
- *data.V2 -0.5837166
- *data.V3  0.9285260
- *data.V4 -0.3783612
- *data.V5 -0.8123411
- * weights2
- *5 x 1 sparse Matrix of class dgCMatrix
- * s0
- * 4.11197445
- *data.V2 -0.16918650
- *data.V3 -0.81104784
- *data.V4 -0.06463799
- *data.V5 -0.29198337
+ *  weights1
+ * 5 x 1 sparse Matrix of class dgCMatrix
+ * s0
+ * 2.6228269
--- End diff --

The original indentation is correct. Is it by accident?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24054690
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -65,6 +67,70 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
   }
+
+  override def save(sc: SparkContext, path: String): Unit = {
+val sqlContext = new SQLContext(sc)
+import sqlContext._
+
+// Create JSON metadata.
+val metadataRDD =
+  sc.parallelize(Seq((this.getClass.getName, 
formatVersion))).toDataFrame(class, version)
+metadataRDD.toJSON.repartition(1).saveAsTextFile(path + /metadata)
+
+// Create Parquet data.
+val data = NaiveBayesModel.Data(labels, pi, theta)
+val dataRDD: DataFrame = sc.parallelize(Seq(data))
+dataRDD.repartition(1).saveAsParquetFile(path + /data)
+  }
+
+  override protected def formatVersion: String = 
NaiveBayesModel.formatVersion
+
+}
+
+object NaiveBayesModel extends Importable[NaiveBayesModel] {
+
+  /** Model data for model import/export */
+  private case class Data(labels: Array[Double], pi: Array[Double], theta: 
Array[Array[Double]])
--- End diff --

Good point


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24054729
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
 ---
@@ -425,25 +427,25 @@ class LogisticRegressionSuite extends FunSuite with 
MLlibTestSparkContext with M
  * weights. The mathematical discussion and proof can be found here:
  * http://en.wikipedia.org/wiki/Multinomial_logistic_regression
  *
- *weights1 = weights$`1` - weights$`0`
- *weights2 = weights$`2` - weights$`0`
+ * weights1 = weights$`1` - weights$`0`
+ * weights2 = weights$`2` - weights$`0`
  *
- * weights1
- *5 x 1 sparse Matrix of class dgCMatrix
- *s0
- * 2.6228269
- *data.V2 -0.5837166
- *data.V3  0.9285260
- *data.V4 -0.3783612
- *data.V5 -0.8123411
- * weights2
- *5 x 1 sparse Matrix of class dgCMatrix
- * s0
- * 4.11197445
- *data.V2 -0.16918650
- *data.V3 -0.81104784
- *data.V4 -0.06463799
- *data.V5 -0.29198337
+ *  weights1
+ * 5 x 1 sparse Matrix of class dgCMatrix
+ * s0
+ * 2.6228269
--- End diff --

Weird, that's an accident.  I'll revert it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5520][MLlib] Make FP-Growth implementat...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4340#issuecomment-72769602
  
  [Test build #26689 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26689/consoleFull)
 for   PR 4340 at commit 
[`f5acf84`](https://github.com/apache/spark/commit/f5acf84942cdef917968facb93446c6dce70af28).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel[Item: ClassTag](`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5520][MLlib] Make FP-Growth implementat...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4340#issuecomment-72769609
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26689/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72770487
  
 Still, though, I'd like to consider whether there's a good way to fix 
this in order to guarantee that requests to commit output for a stage occur 
after the stage start event (from the OutputCommitter's POV).

I handed this by refactoring the code so that only remote RPCs are routed 
through the actor.  The methods called by DAGScheduler are now `synchronized` 
and directly update the hashmaps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [FIX][MLLIB] fix seed handling in Python GMM

2015-02-03 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/4349

[FIX][MLLIB] fix seed handling in Python GMM

If `seed` is `None` on the python side, it will pass in as a `null`. So we 
should use `java.lang.Long` instead of `Long` to take it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark gmm-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4349


commit 3be592612f9e4b5b6a1fbc2bf84ac006fa223bfb
Author: Xiangrui Meng m...@databricks.com
Date:   2015-02-04T01:00:39Z

fix seed handling in Python GMM




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72770514
  
  [Test build #26695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26695/consoleFull)
 for   PR 4066 at commit 
[`97da5fe`](https://github.com/apache/spark/commit/97da5feb6fe49255afaac1dc9d5db1edf8c1ff42).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-72770451
  
Okay here are the two options.

1. createRDD returns RDD[(K,V)] or RDD[R], and DStream.foreachRDD uses 
rdd.asInstanceOf[HasOffsetRanges]
2. createRDD returns KafkaRDD[(K,V)] or KafkaRDD[R] and DStream.foreachRDD 
uses rdd.asInstanceOf[KafkaRDD[_]]

I think I am okay with either one. Stepping back, my original concern was 
returning something that had no binary compatibility issues. Both solution 
suffices. Between these two, since you feel so strongly against (2), lets go 
with (1). 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...

2015-02-03 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4336#discussion_r24055322
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala 
---
@@ -260,11 +260,11 @@ private[sql] class DataFrameImpl protected[sql](
 
   override def take(n: Int): Array[Row] = head(n)
 
-  override def collect(): Array[Row] = 
queryExecution.executedPlan.executeCollect()
+  override def collect(): Array[Row] = rdd.collect()
 
   override def collectAsList(): java.util.List[Row] = 
java.util.Arrays.asList(rdd.collect() :_*)
 
-  override def count(): Long = 
groupBy().count().rdd.collect().head.getLong(0)
+  override def count(): Long = rdd.count()
--- End diff --

Hmm, but the `rdd.count()` is not necessary to go through the Catalyst 
optimizer, isn't it? It's already an parallel processing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5520][MLlib] Make FP-Growth implementat...

2015-02-03 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4340#issuecomment-72770558
  
LGTM. Merged into master and branch-1.3. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72770871
  
  [Test build #26691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26691/consoleFull)
 for   PR 4066 at commit 
[`f582574`](https://github.com/apache/spark/commit/f58257443b20835e952aa096f2a5a1a47bddb337).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskCommitDenied(`
  * `class CommitDeniedException(`
  * `  class OutputCommitCoordinatorActor(outputCommitCoordinator: 
OutputCommitCoordinator)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24055623
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
 ---
@@ -459,7 +461,41 @@ class LogisticRegressionSuite extends FunSuite with 
MLlibTestSparkContext with M
 // very steep curve in logistic function so that when we draw samples 
from distribution, it's
 // very easy to assign to another labels. However, this prediction 
result is consistent to R.
 
validatePrediction(model.predict(validationRDD.map(_.features)).collect(), 
validationData, 0.47)
+  }
+
+  test(model export/import) {
--- End diff --

That requires us to maintain the exporters for each version. I'm thinking 
of keeping one saved model as test resources and test against that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72762448
  
  [Test build #26691 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26691/consoleFull)
 for   PR 4066 at commit 
[`f582574`](https://github.com/apache/spark/commit/f58257443b20835e952aa096f2a5a1a47bddb337).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4215#issuecomment-72763752
  
  [Test build #26692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26692/consoleFull)
 for   PR 4215 at commit 
[`9215851`](https://github.com/apache/spark/commit/921585157be8e1eec9419715a5a0aa5614e6e16b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72763768
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26677/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-03 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4106#issuecomment-72763704
  
Hey @mccheah - if you are too busy I think it's fine to let it slip past 
1.3, given that there are still several unknowns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72763765
  
**[Test build #26677 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26677/consoleFull)**
 for PR 4066 at commit 
[`dd00b7c`](https://github.com/apache/spark/commit/dd00b7c83fd0a4fa1cbd9115f2e0a8e69bc519b9)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4331


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72763703
  
There's one subtle, extremely-unlikely race condition that can still occur 
here and which I would like to fix: currently, we seem to assume that messages 
sent from the DAGScheduler to the local OutputCommitCoordinator will be 
processed before commit requests for tasks.  However, we send these as 
fire-and-forget messages and do not wait for acknowledgements, so it would 
technically be legal for OutputCommitCoordinator to receive a request to commit 
for a task that belongs to a stage that OutputCommitCoordinator has not heard 
about.  This will cause the coordinator to deny the task attempt, since it will 
think that it's from a completed stage.  I think that this isn't a huge deal in 
practice, since I think we'll just end up scheduling another task (plus this 
particular race should be extremely unlikely).

Still, though, I'd like to consider whether there's a good way to fix this 
in order to guarantee that requests to commit output for a stage occur after 
the stage start event (from the OutputCommitter's POV).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5460][MLlib] Wrapped `Try` around `dele...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4347#issuecomment-72764883
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72765259
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26686/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72765252
  
  [Test build #26686 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26686/consoleFull)
 for   PR 4066 at commit 
[`997b41b`](https://github.com/apache/spark/commit/997b41b788d7f0df5feaa72b8af79d06fb24ee9f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskCommitDenied(`
  * `class CommitDeniedException(`
  * `  class OutputCommitCoordinatorActor(outputCommitCoordinator: 
OutputCommitCoordinator)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...

2015-02-03 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4348

[SPARK-5579][SQL][DataFrame] Support for project/filter using SQL 
expressions

```scala
df.selectExpr(abs(colA), colB)
df.filter(age  21)
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-5579

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4348.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4348


commit ac65f4b86bfce0ba2e170c31f6a50c58255f960e
Author: Reynold Xin r...@databricks.com
Date:   2015-02-04T00:25:50Z

[SPARK-5579][SQL][DataFrame] Support for project/filter using SQL 
expressions.

e.g.

df.selectExpr(abs(colA), colB)

df.filter(age  21)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3233#issuecomment-72766658
  
  [Test build #26694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26694/consoleFull)
 for   PR 3233 at commit 
[`70d4044`](https://github.com/apache/spark/commit/70d40444911a11d72290a8c669aeb3a69d3afb47).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4348#issuecomment-7272
  
  [Test build #26693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26693/consoleFull)
 for   PR 4348 at commit 
[`ac65f4b`](https://github.com/apache/spark/commit/ac65f4b86bfce0ba2e170c31f6a50c58255f960e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5574] use given name prefix in dir

2015-02-03 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4344#issuecomment-72767317
  
Do you know what the common root is? Is it just tmp? If yes, maybe we do 
want to prefix spark always. If it is some special Spark tmp folder, then I 
think it is fine to not have the spark prefix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...

2015-02-03 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4336#discussion_r24053977
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala 
---
@@ -260,11 +260,11 @@ private[sql] class DataFrameImpl protected[sql](
 
   override def take(n: Int): Array[Row] = head(n)
 
-  override def collect(): Array[Row] = 
queryExecution.executedPlan.executeCollect()
+  override def collect(): Array[Row] = rdd.collect()
 
   override def collectAsList(): java.util.List[Row] = 
java.util.Arrays.asList(rdd.collect() :_*)
 
-  override def count(): Long = 
groupBy().count().rdd.collect().head.getLong(0)
+  override def count(): Long = rdd.count()
--- End diff --

@marmbrus is correct. rdd.count() doesn't go through the optimizer. The 
original solution goes through the optimizer.

Maybe a better change is to add some inline comment to explain this makes 
sure it goes through the optimizer, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5574] use given name prefix in dir

2015-02-03 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4344#issuecomment-72768195
  
52f575 added a user-specific root under the temp dir, so you'd always have 
these directories under one that's named `spark-[uuid]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24054410
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
 ---
@@ -459,7 +461,41 @@ class LogisticRegressionSuite extends FunSuite with 
MLlibTestSparkContext with M
 // very steep curve in logistic function so that when we draw samples 
from distribution, it's
 // very easy to assign to another labels. However, this prediction 
result is consistent to R.
 
validatePrediction(model.predict(validationRDD.map(_.features)).collect(), 
validationData, 0.47)
+  }
+
+  test(model export/import) {
--- End diff --

I have a question about upgradability. When we have `V2`, how to test 
`ImporterV1`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24054680
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -17,14 +17,17 @@
 
 package org.apache.spark.mllib.classification
 
+import org.apache.spark.SparkContext
 import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.classification.impl.GLMClassificationModel
 import org.apache.spark.mllib.linalg.BLAS.dot
 import org.apache.spark.mllib.linalg.{DenseVector, Vector}
 import org.apache.spark.mllib.optimization._
 import org.apache.spark.mllib.regression._
-import org.apache.spark.mllib.util.{DataValidators, MLUtils}
+import org.apache.spark.mllib.util.{DataValidators, Exportable, Importable}
--- End diff --

Saveable and Loader sound good to me.  What about PMML?  Does it sounds 
reasonable to have  separate traits for PMML such as PMMLSaveable and 
PMMLLoadable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...

2015-02-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4233#discussion_r24054795
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
 ---
@@ -459,7 +461,41 @@ class LogisticRegressionSuite extends FunSuite with 
MLlibTestSparkContext with M
 // very steep curve in logistic function so that when we draw samples 
from distribution, it's
 // very easy to assign to another labels. However, this prediction 
result is consistent to R.
 
validatePrediction(model.predict(validationRDD.map(_.features)).collect(), 
validationData, 0.47)
+  }
+
+  test(model export/import) {
--- End diff --

True, I guess we should have an analogous ExporterV1 and keep it around for 
testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5574] use given name prefix in dir

2015-02-03 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4344#issuecomment-72769307
  
Then this lgtm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4345#issuecomment-72769530
  
  [Test build #26687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26687/consoleFull)
 for   PR 4345 at commit 
[`b452b8d`](https://github.com/apache/spark/commit/b452b8d0c44488328367fed5b5f2eeba6f0c6c55).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4345#issuecomment-72769536
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26687/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...

2015-02-03 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4345#discussion_r24055018
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -132,14 +132,14 @@ class LogisticRegressionModel private[ml] (
   override def transform(dataset: DataFrame, paramMap: ParamMap): 
DataFrame = {
 transformSchema(dataset.schema, paramMap, logging = true)
 val map = this.paramMap ++ paramMap
-val scoreFunction: Vector = Double = (v) = {
+val scoreFunction = udf((v: Vector) = {
   val margin = BLAS.dot(v, weights)
   1.0 / (1.0 + math.exp(-margin))
-}
+} : Double)
--- End diff --

About the syntax, I like the following better

~~~
val margin = udf { v: Vector = 
  val margin = BLAS.dot(v, weights)
  1.0 / (1.0 + math.exp(-margin))
}
~~~



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Correct the default size of TimestampTyp...

2015-02-03 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4314#discussion_r24055002
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/dataTypes.scala ---
@@ -402,7 +402,7 @@ case object DateType extends NativeType {
 }
 
 
-protected[sql] abstract class NumericType extends NativeType with 
PrimitiveType {
--- End diff --

OK, I see. thanks for explanation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4345#issuecomment-72770311
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26688/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4345#issuecomment-72770304
  
  [Test build #26688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26688/consoleFull)
 for   PR 4345 at commit 
[`0a0b339`](https://github.com/apache/spark/commit/0a0b339f311573d775b1704ab2e0860e22746c3f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5520][MLlib] Make FP-Growth implementat...

2015-02-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4340


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...

2015-02-03 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4345#discussion_r24055407
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -26,9 +27,8 @@ import 
org.apache.spark.sql.catalyst.errors.TreeNodeException
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.types._
 
-/* Implicits */
 import org.apache.spark.sql.TestData._
-import org.apache.spark.sql.test.TestSQLContext._
+import org.apache.spark.sql.test.TestSQLContext.{udf = _, _}
--- End diff --

Maybe it is useful to leave a comment here about what it does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...

2015-02-03 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4336#discussion_r24055445
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala 
---
@@ -260,11 +260,11 @@ private[sql] class DataFrameImpl protected[sql](
 
   override def take(n: Int): Array[Row] = head(n)
 
-  override def collect(): Array[Row] = 
queryExecution.executedPlan.executeCollect()
+  override def collect(): Array[Row] = rdd.collect()
 
   override def collectAsList(): java.util.List[Row] = 
java.util.Arrays.asList(rdd.collect() :_*)
 
-  override def count(): Long = 
groupBy().count().rdd.collect().head.getLong(0)
+  override def count(): Long = rdd.count()
--- End diff --

As an example of a query that can take advantage of the optimizer:

df.count()

If you run count from rdd, then all columns are extracted. If you run count 
as is, no actual columns are read.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-72770879
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26691/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [FIX][MLLIB] fix seed handling in Python GMM

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4349#issuecomment-72771073
  
  [Test build #26696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26696/consoleFull)
 for   PR 4349 at commit 
[`3be5926`](https://github.com/apache/spark/commit/3be592612f9e4b5b6a1fbc2bf84ac006fa223bfb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...

2015-02-03 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4336#discussion_r24055512
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala 
---
@@ -260,11 +260,11 @@ private[sql] class DataFrameImpl protected[sql](
 
   override def take(n: Int): Array[Row] = head(n)
 
-  override def collect(): Array[Row] = 
queryExecution.executedPlan.executeCollect()
+  override def collect(): Array[Row] = rdd.collect()
 
   override def collectAsList(): java.util.List[Row] = 
java.util.Arrays.asList(rdd.collect() :_*)
 
-  override def count(): Long = 
groupBy().count().rdd.collect().head.getLong(0)
+  override def count(): Long = rdd.count()
--- End diff --

You should always go through the optimizer :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...

2015-02-03 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4336#discussion_r24055684
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala 
---
@@ -260,11 +260,11 @@ private[sql] class DataFrameImpl protected[sql](
 
   override def take(n: Int): Array[Row] = head(n)
 
-  override def collect(): Array[Row] = 
queryExecution.executedPlan.executeCollect()
+  override def collect(): Array[Row] = rdd.collect()
 
   override def collectAsList(): java.util.List[Row] = 
java.util.Arrays.asList(rdd.collect() :_*)
 
-  override def count(): Long = 
groupBy().count().rdd.collect().head.getLong(0)
+  override def count(): Long = rdd.count()
--- End diff --

Ok, that makes sense, thanks for the explanation. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-72617088
  
I took a pass through the public API. I'm not very familiar with Kafka so 
it was somewhat slow going. However, some reactions:

1. We should try to tighten, simplify, and clarify the way we name and 
document everything in this public API. Most of the comments were about this. 
The most important IMO is coming up with a good name for the new streams 
returned and clearly explaining how they differ from the old Kafka stream. To 
me, the main differences seems to be in the way we (a) decide what goes into 
which batch and (b) actually ingest the data. I proposed javadoc and naming 
scheme that emphasizing that distinction.
2. Is there plans to add a Java and Python wrappers here next? Those are 
straightforward and it would be good to have them. Maybe in a follow on PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72617481
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26643/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72617479
  
  [Test build #26643 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26643/consoleFull)
 for   PR 4331 at commit 
[`9ab78b4`](https://github.com/apache/spark/commit/9ab78b4262961deafe0256c8c28d2911a4c07b0a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...

2015-02-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4331#issuecomment-72617632
  
  [Test build #26646 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26646/consoleFull)
 for   PR 4331 at commit 
[`78ebcfa`](https://github.com/apache/spark/commit/78ebcfa6ba750e081f6b5c7b07c8d04f32c2d4d6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

501 - 600 of 684 matches

Mail list logo