[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
Thanks @gatorsmile!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15432
  
**[Test build #68185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68185/consoleFull)**
 for PR 15432 at commit 
[`9b9a49f`](https://github.com/apache/spark/commit/9b9a49f80c216e00c526a9e07e9f764c1409bd10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658659
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

Yes, makes sense. I will fix them here first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

If you meant something like `Literal(Long: Utils.random.nextLong()))`, I 
guess both are fine assuming from the discussion in the PR 12452.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658461
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

The complier seems complaining if we specify the return type in `def this`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15767: [SPARK-18269][SQL] CSV datasource should read null prope...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15767
  
**[Test build #68184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68184/consoleFull)**
 for PR 15767 at commit 
[`b913eac`](https://github.com/apache/spark/commit/b913eac5f5e3559ed26c02c61f62b556b86413f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15767: [SPARK-18269][SQL] CSV datasource should read null prope...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15767
  
**[Test build #68182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68182/consoleFull)**
 for PR 15767 at commit 
[`c0667d1`](https://github.com/apache/spark/commit/c0667d1e31e6eed04bb0be1be0eef0f86fba1bb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68183/consoleFull)**
 for PR 11105 at commit 
[`6371363`](https://github.com/apache/spark/commit/63713636c7e808b31783c60b273bb8225fee8af1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15637
  
We already have those don't we? sparks own hash expresssion.

On Friday, November 4, 2016, Zhenhua Wang  wrote:

> In that way, we can only get the hashed value instead of real value of the
> column, right? So I think we still need to implement a hash code for
> fractional types.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15132
  
**[Test build #68179 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68179/consoleFull)**
 for PR 15132 at commit 
[`618680f`](https://github.com/apache/spark/commit/618680fbcf4a7c9ef2a3ae52e0dba682d84c9b45).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class PerPartitionConfig() extends Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68179/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14750
  
**[Test build #68181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68181/consoleFull)**
 for PR 14750 at commit 
[`3bd9362`](https://github.com/apache/spark/commit/3bd9362062b41ad64146cd24936501f4dfd65b68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15766
  
Only the owner(yourself) can close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15432
  
LGTM except a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14498: [SPARK-16904] [SQL] Removal of Hive Built-in Hash...

2016-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14498#discussion_r86658169
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -487,24 +487,6 @@ private[hive] class TestHiveQueryExecution(
   }
 }
 
-
-private[hive] class TestHiveFunctionRegistry extends 
SimpleFunctionRegistry {
--- End diff --

We can still remove this class if we add back the removed spark builtin 
hash function manually right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658154
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

Why not specifying the data type here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658156
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -87,6 +87,10 @@ case class Rand(seed: Long) extends RDG {
   }
 }
 
+object Rand {
+  def apply(seed: Long): Rand = Rand(Literal(seed))
--- End diff --

The same here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658157
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -64,17 +66,15 @@ abstract class RDG extends LeafExpression with 
Nondeterministic {
0.9629742951434543
   > SELECT _FUNC_(0);
0.8446490682263027
+  > SELECT _FUNC_(null);
+   0.8446490682263027
   """)
 // scalastyle:on line.size.limit
-case class Rand(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextDouble()
+case class Rand(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

The same here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #68180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68180/consoleFull)**
 for PR 12135 at commit 
[`446013a`](https://github.com/apache/spark/commit/446013a715e0ddba3592b600da20eb14cda6bab0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68178/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68178/consoleFull)**
 for PR 11105 at commit 
[`423904e`](https://github.com/apache/spark/commit/423904e3ca89f9c05cd5d554401d49f471457178).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15132
  
**[Test build #68179 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68179/consoleFull)**
 for PR 15132 at commit 
[`618680f`](https://github.com/apache/spark/commit/618680fbcf4a7c9ef2a3ae52e0dba682d84c9b45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15767: [SPARK-18269][SQL] CSV datasource should read nul...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15767#discussion_r86658040
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -232,7 +232,7 @@ private[csv] object CSVTypeCast {
   nullable: Boolean = true,
   options: CSVOptions = CSVOptions()): Any = {
 
-if (nullable && datum == options.nullValue) {
+if (datum == null || nullable && datum == options.nullValue) {
--- End diff --

Sure, makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15132
  
@rxin thanks, changed to abstract class.  If you think that's sufficient 
future proofing I otherwise think this is a worthwhile change, seems like it 
meets a real user need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15693
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68173/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15693
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15693
  
**[Test build #68173 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68173/consoleFull)**
 for PR 15693 at commit 
[`448abfa`](https://github.com/apache/spark/commit/448abfac34d5454a3bc74e0a4df11736713179b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14660
  
@rxin, I thought reading a small file is a possible corner case. In this 
case, it would not be only a small fraction. Improving it without a regression 
might be a legitimate optimization by preventing to launch another job.

It is true that in such (or most cases) users would just tolerate the 
elapsed time but I guess we have added such improvements time to time to 
improve the performance in general bit by bit.

`spark.sql.sources.parallelPartitionDiscovery.threshold` might be the 
option for the similar cases.

I guess this PR does not introduce another a bunch of newly added codes to 
maintain more but just use the original codes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68178/consoleFull)**
 for PR 11105 at commit 
[`423904e`](https://github.com/apache/spark/commit/423904e3ca89f9c05cd5d554401d49f471457178).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14750
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68176/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14750
  
**[Test build #68176 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68176/consoleFull)**
 for PR 14750 at commit 
[`c057d0c`](https://github.com/apache/spark/commit/c057d0c027199b43a24df853c26b755aeaf07c94).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14750
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/15766
  
@cloud-fan I will appreciate that you can help to close this PR~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/15766
  
@rxin  @cloud-fan you are rigth,hash should be unregistered and replace 
with Hive's hash, or we could put the failed hash testcase into blacklist as 
@gatorsmile  's work #14498 . I will close the PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r86657628
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -583,6 +633,50 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 tableWithStats.copy(properties = getOriginalTableProperties(table))
   }
 
+  private def restoreHiveSerdeTable(table: CatalogTable): CatalogTable = {
+val hiveTable = table.copy(
+  provider = Some(DDLUtils.HIVE_PROVIDER),
+  tracksPartitionsInCatalog = true)
+
+val schemaFromTableProps = getSchemaFromTableProperties(table)
+if (DataType.equalsIgnoreCaseAndNullability(schemaFromTableProps, 
table.schema)) {
+  hiveTable.copy(
+schema = schemaFromTableProps,
+partitionColumnNames = 
getPartitionColumnsFromTableProperties(table),
+bucketSpec = getBucketSpecFromTableProperties(table))
+} else {
+  // Hive metastore may change the table schema, e.g. schema 
inference. If the table
+  // schema we read back is different(ignore case and nullability) 
from the one in table
+  // properties which was written when creating table, we should 
respect the table schema
+  // from hive.
+  logWarning(s"The table schema given by Hive 
metastore(${table.schema.simpleString}) is " +
+"different from the schema when this table was created by Spark 
SQL" +
+s"(${schemaFromTableProps.simpleString}). We have to trust the 
table schema from Hive " +
--- End diff --

Nit: clarify that we are falling back to the hive metastore schema. 
Trusting it sounds a little ambiguous.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15746: [SPARK-18239][SPARKR] Gradient Boosted Tree for R

2016-11-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15746#discussion_r86657586
  
--- Diff: R/pkg/R/mllib.R ---
@@ -1828,13 +1849,13 @@ setMethod("summary", signature(object = 
"RandomForestRegressionModel"),
 #' @note summary(RandomForestClassificationModel) since 2.1.0
 setMethod("summary", signature(object = "RandomForestClassificationModel"),
   function(object) {
-ans <- summary.randomForest(object)
+ans <- summary.treeEnsemble(object)
 class(ans) <- "summary.RandomForestClassificationModel"
 ans
   })
 
 #  Prints the summary of Random Forest Regression Model
-print.summary.randomForest <- function(x) {
+print.summary.treeEnsemble <- function(x) {
--- End diff --

For example, summary on `rpart` model shows both error and node-by-node 
information. I think it is still useful this way
```
Call:
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis,
method = "class")
  n= 81

  CP nsplit rel errorxerror  xstd
1 0.17647059  0 1.000 1.000 0.2155872
2 0.01960784  1 0.8235294 0.9411765 0.2107780
3 0.0100  4 0.7647059 1.0588235 0.2200975

Variable importance
 StartAge Number
64 24 12

Node number 1: 81 observations,complexity param=0.1764706
  predicted class=absent   expected loss=0.2098765  P(node) =1
class counts:6417
   probabilities: 0.790 0.210
  left son=2 (62 obs) right son=3 (19 obs)
  Primary splits:
  Start  < 8.5  to the right, improve=6.762330, (0 missing)
  Number < 5.5  to the left,  improve=2.866795, (0 missing)
  Age< 39.5 to the left,  improve=2.250212, (0 missing)
  Surrogate splits:
  Number < 6.5  to the left,  agree=0.802, adj=0.158, (0 split)

Node number 2: 62 observations,complexity param=0.01960784
  predicted class=absent   expected loss=0.09677419  P(node) =0.7654321
class counts:56 6
   probabilities: 0.903 0.097
  left son=4 (29 obs) right son=5 (33 obs)
  Primary splits:
  Start  < 14.5 to the right, improve=1.0205280, (0 missing)
  Age< 55   to the left,  improve=0.6848635, (0 missing)
  Number < 4.5  to the left,  improve=0.2975332, (0 missing)
  Surrogate splits:
  Number < 3.5  to the left,  agree=0.645, adj=0.241, (0 split)
  Age< 16   to the left,  agree=0.597, adj=0.138, (0 split)

Node number 3: 19 observations
  predicted class=present  expected loss=0.4210526  P(node) =0.2345679
class counts: 811
   probabilities: 0.421 0.579

Node number 4: 29 observations
  predicted class=absent   expected loss=0  P(node) =0.3580247
class counts:29 0
   probabilities: 1.000 0.000

Node number 5: 33 observations,complexity param=0.01960784
  predicted class=absent   expected loss=0.1818182  P(node) =0.4074074
class counts:27 6
   probabilities: 0.818 0.182
  left son=10 (12 obs) right son=11 (21 obs)
  Primary splits:
  Age< 55   to the left,  improve=1.2467530, (0 missing)
  Start  < 12.5 to the right, improve=0.2887701, (0 missing)
  Number < 3.5  to the right, improve=0.1753247, (0 missing)
  Surrogate splits:
  Start  < 9.5  to the left,  agree=0.758, adj=0.333, (0 split)
  Number < 5.5  to the right, agree=0.697, adj=0.167, (0 split)

Node number 10: 12 observations
  predicted class=absent   expected loss=0  P(node) =0.1481481
class counts:12 0
   probabilities: 1.000 0.000

Node number 11: 21 observations,complexity param=0.01960784
  predicted class=absent   expected loss=0.2857143  P(node) =0.2592593
class counts:15 6
   probabilities: 0.714 0.286
  left son=22 (14 obs) right son=23 (7 obs)
  Primary splits:
  Age< 111  to the right, improve=1.71428600, (0 missing)
  Start  < 12.5 to the right, improve=0.79365080, (0 missing)
  Number < 3.5  to the right, improve=0.07142857, (0 missing)

Node number 22: 14 observations
  predicted class=absent   expected loss=0.1428571  P(node) =0.1728395
class counts:12 2
   probabilities: 0.857 0.143

Node number 23: 7 observations
  predicted class=present  expected loss=0.4285714  P(node) =0.08641975
class counts: 3 4
   probabilities: 0.429 0.571
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68175/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #68175 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68175/consoleFull)**
 for PR 15314 at commit 
[`ad09fec`](https://github.com/apache/spark/commit/ad09fec26875c99181278137ee0f3f94313782f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68177/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68177/consoleFull)**
 for PR 11105 at commit 
[`e35e4a1`](https://github.com/apache/spark/commit/e35e4a165a5e7c54486c45dad9531250e2aebcf2).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12904: [SPARK-15125][SQL] Changing CSV data source mapping of e...

2016-11-04 Thread sureshthalamati
Github user sureshthalamati commented on the issue:

https://github.com/apache/spark/pull/12904
  
I was testing the fix with different scenarios mentioned in the comments. I 
can not make 
CSV writer write quoted  empty string for empty strings in the data.  One 
of the issue I filed  got fixed , but still can not make it work.   

https://github.com/uniVocity/univocity-parsers/issues/123



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
In that way, we can only get the hashed value instead of real value of the 
column, right? So I think we still need to implement a hash code for fractional 
types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68177/consoleFull)**
 for PR 11105 at commit 
[`e35e4a1`](https://github.com/apache/spark/commit/e35e4a165a5e7c54486c45dad9531250e2aebcf2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15776: [SPARK-17710][Follow UP] Add comments to state why 'Util...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15776
  
**[Test build #3416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3416/consoleFull)**
 for PR 15776 at commit 
[`145b31c`](https://github.com/apache/spark/commit/145b31cd152679d4dbd4e17d4d31d4d925df6cc8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14750
  
**[Test build #68176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68176/consoleFull)**
 for PR 14750 at commit 
[`c057d0c`](https://github.com/apache/spark/commit/c057d0c027199b43a24df853c26b755aeaf07c94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15763
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68170/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15763
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15637
  
Yea good point - the underlying implementation really only needs a hash 
code, it would be trivial to support all types. But even easier, I think you 
can just compute the hash (using sql expression) and pass it in to support all 
data types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15763
  
**[Test build #68170 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68170/consoleFull)**
 for PR 15763 at commit 
[`1c1864c`](https://github.com/apache/spark/commit/1c1864caa764130f947be9ccd2b132d4ac75ec2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
Yes, but in its impl, it actually only supports above types.
```
  @Override
  public void add(Object item, long count) {
if (item instanceof String) {
  addString((String) item, count);
} else {
  addLong(Utils.integralToLong(item), count);
}
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15779
  
+1 on removing the use of exceptions. I thought it was a bit of an awkward 
solution to begin with. Thanks a lot for this pr, I will take a look soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15637
  
It supports arbitrary objects

```
  /**
   * Increments {@code item}'s count by one.
   */
  public abstract void add(Object item);
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
OK, I'll try to use count min sketch. BTW, seems it only supports Integral 
and String types for now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #68175 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68175/consoleFull)**
 for PR 15314 at commit 
[`ad09fec`](https://github.com/apache/spark/commit/ad09fec26875c99181278137ee0f3f94313782f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15766
  
I am working on the related issue in 
https://github.com/apache/spark/pull/14498 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15314
  
Typo fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15766
  
@rxin  good catch! We do unregister the spark builtin hash in test: 
https://github.com/apache/spark/blob/master/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala#L60-L61

So we have a little more work to do here, we should register the hive hash 
function in that test suite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15637
  
It supports any data types, uses less space and can provide probabilistic 
frequencies for a large number of distinct values, and it's already implemented 
in Spark. We just need to add a wrapper for it to turn it into an aggregate 
function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
Oh, do you mean using count min sketch for equi-width histogram? Actually I 
don't know about the sketch, I need to look into it to see if it's easy to use 
as an agg function and also support float/double/decimal types. But I do think 
this pr takes much less effort to implement what we want :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r86656669
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -521,15 +521,15 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 val catalogTable =
   sessionState.catalog.getTableMetadata(TableIdentifier(tableName))
 relation match {
-  case LogicalRelation(r: HadoopFsRelation, _, Some(table)) =>
+  case LogicalRelation(r: HadoopFsRelation, _, _) =>
 if (!isDataSourceTable) {
   fail(
 s"${classOf[MetastoreRelation].getCanonicalName} is expected, 
but found " +
   s"${HadoopFsRelation.getClass.getCanonicalName}.")
 }
 userSpecifiedLocation match {
   case Some(location) =>
-assert(table.storage.locationUri.get === location)
+assert(r.options("path") === location)
--- End diff --

it's also a follow-up: 
https://github.com/apache/spark/pull/15024/files#r86273774


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r86656654
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -442,7 +468,9 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   private def updateLocationInStorageProps(
   table: CatalogTable,
   newPath: Option[String]): CatalogStorageFormat = {
-val propsWithoutPath = 
table.storage.properties.filterKeys(_.toLowerCase != "path")
+val propsWithoutPath = table.storage.properties.filter {
+  case (k, v) => k.toLowerCase != "path"
+}
--- End diff --

no, the map returned by `filterKeys` is not serializable, I'll add some 
comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r86656644
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -417,11 +437,17 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   override def renameTable(db: String, oldName: String, newName: String): 
Unit = withClient {
-val rawTable = client.getTable(db, oldName)
-
-val storageWithNewPath = if (rawTable.tableType == MANAGED) {
-  // If it's a managed table and we are renaming it, then the path 
option becomes inaccurate
-  // and we need to update it according to the new table name.
+val rawTable = getRawTable(db, oldName)
+
+// Note that Hive serde tables don't use path option in storage 
properties to store the value
+// of table location, but use `locationUri` field to store it 
directly. And `locationUri` field
+// will be updated automatically in Hive metastore by the `alterTable` 
call at the end of this
+// method. Here we only update the path option if the path option 
already exists in storage
+// properties, to avoid adding a unnecessary path option for Hive 
serde tables.
+val hasPathOption = new 
CaseInsensitiveMap(rawTable.storage.properties).contains("path")
--- End diff --

not related, but it's a follow up of 
https://github.com/apache/spark/pull/15024/files#diff-159191585e10542f013cb3a714f26075R422
 , which may add an extra path option to Hive serde tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
For equi-height hsitogram, we need extra info like ndv's in each bin. Does 
count min sketch also have this information? I had a discussion with Herman and 
Tim about histogram construction before, we want to compute all stats including 
histograms in a single table scan. They suggested we modify 
ApproximatePercentile to get equi-height histograms, i.e. endpoints of bins and 
the ndv's within them. And this pr is only for getting equi-width histograms. 
You can check [this JIRA](https://issues.apache.org/jira/browse/SPARK-17074) 
for the discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...

2016-11-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15756


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15779
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15779
  
**[Test build #68174 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68174/consoleFull)**
 for PR 15779 at commit 
[`2bee341`](https://github.com/apache/spark/commit/2bee34103555465b826ab72160748bae34c81568).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15779
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68174/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15756: [SPARK-18256] Improve the performance of event log repla...

2016-11-04 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15756
  
Cool. Merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15779
  
@sethah @yanboliang I just saw your PRs for SPARK-17748.  Awesome change.  
I just saw a few nits along the way.

The only major item is making SingularMatrixException private ml.  This 
would be the first public Exception type we have in MLlib, I believe.  If it is 
to be public, I'd prefer it live in spark.ml.linalg, so we could move it there 
instead of making it private.

Also, I'm not a big fan of using Exceptions to handle logic.  I'd prefer to 
change the use of CholeskySolver to pass the failure code, rather than relying 
on exceptions.  If that sounds good to you, then I'll make a follow up JIRA for 
it.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15779
  
**[Test build #68174 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68174/consoleFull)**
 for PR 15779 at commit 
[`2bee341`](https://github.com/apache/spark/commit/2bee34103555465b826ab72160748bae34c81568).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15779: [SPARK-17748][ML] Minor cleanups to one-pass line...

2016-11-04 Thread jkbradley
GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/15779

[SPARK-17748][ML] Minor cleanups to one-pass linear regression with elastic 
net

## What changes were proposed in this pull request?

* Made SingularMatrixException private ml
* WeightedLeastSquares: Changed to allow tol >= 0 instead of only tol > 0

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark wls-cleanups

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15779


commit 2bee34103555465b826ab72160748bae34c81568
Author: Joseph K. Bradley 
Date:   2016-11-05T01:35:44Z

Minor cleanups.  Made SingularMatrixException private ml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15637
  
Why not use count min sketch then? You would get more signal (with some 
error) for a much larger range of values.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
Yes, in our design, equi-width histogram is a seq of single valued bins, 
which is used for columns with low cardinality, so that we can get accurate 
estimation. When cardinality is high, equi-height histogram is better than 
equi-width histogram (with some fixed width), especially in data skew scenario, 
where the skewed value would be hidden in a equi-width bin, while in 
equi-height histogram, it would occupies many contiguous bins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15693
  
**[Test build #68173 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68173/consoleFull)**
 for PR 15693 at commit 
[`448abfa`](https://github.com/apache/spark/commit/448abfac34d5454a3bc74e0a4df11736713179b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15726: [SPARK-18107][SQL][FOLLOW-UP] Insert overwrite statement...

2016-11-04 Thread ericl
Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15726
  
@viirya that makes sense to me


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15693: [SPARK-18125][SQL] Fix a compilation error in cod...

2016-11-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15693#discussion_r86656043
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala
 ---
@@ -63,15 +63,33 @@ case class ReferenceToExpressions(result: Expression, 
children: Seq[Expression])
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val childrenGen = children.map(_.genCode(ctx))
-val childrenVars = childrenGen.zip(children).map {
-  case (childGen, child) => LambdaVariable(childGen.value, 
childGen.isNull, child.dataType)
-}
+val (childrenVars, classChildrenVars) = childrenGen.zip(children).map {
+  case (childGen, child) =>
+val childVar = LambdaVariable(childGen.value, childGen.isNull, 
child.dataType)
+
+// SPARK-18125: The children vars are local variables. If the 
result expression uses
+// splitExpression, those variables cannot be accessed so 
compilation fails.
+// To fix it, we use class variables to hold those local variables.
+val classChildVarName = ctx.freshName("classChildVar")
+val classChildVarIsNull = ctx.freshName("classChildVarIsNull")
+ctx.addMutableState(ctx.javaType(childVar.dataType), 
classChildVarName, "")
+ctx.addMutableState("boolean", classChildVarIsNull, "")
+val classChildVar =
+  LambdaVariable(classChildVarName, classChildVarIsNull, 
childVar.dataType)
+
+(childVar, classChildVar)
+}.unzip
+
+val initClassChildrenVars = classChildrenVars.zipWithIndex.map { case 
(classChildrenVar, i) =>
--- End diff --

Just want to avoid tangling all codes together. I made the change to 
generate the code in previous block. Please see if it is better or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15769
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15769
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68168/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15769
  
**[Test build #68168 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68168/consoleFull)**
 for PR 15769 at commit 
[`4e72745`](https://github.com/apache/spark/commit/4e72745da6e691d5f006184e85998e7519fab50b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68172/consoleFull)**
 for PR 11105 at commit 
[`3a1ab67`](https://github.com/apache/spark/commit/3a1ab6716dfde678aa4e03a6c4db6fa17d544fc7).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68172/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15637
  
How would this pr help you with equi-width histogram? This function just 
gives you the frequency count for keys when the cardinality is low.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68171 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68171/consoleFull)**
 for PR 11105 at commit 
[`6983bfa`](https://github.com/apache/spark/commit/6983bfa5131ccaa5c4d9ebfb86de7871d4c6a60f).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68171/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15726: [SPARK-18107][SQL][FOLLOW-UP] Insert overwrite statement...

2016-11-04 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15726
  
@ericl Currently I prefer the first one, let `HiveClientImpl` create 
multiple internal thrift clients, since I don't like to change external catalog 
for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68172 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68172/consoleFull)**
 for PR 11105 at commit 
[`3a1ab67`](https://github.com/apache/spark/commit/3a1ab6716dfde678aa4e03a6c4db6fa17d544fc7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
@rxin Moreover, with this pr we can compute accurate equi-width histogram, 
while count min sketch only gets an estimated result? I think it's better to 
have an accurate result given that both methods need a table scan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #68171 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68171/consoleFull)**
 for PR 11105 at commit 
[`6983bfa`](https://github.com/apache/spark/commit/6983bfa5131ccaa5c4d9ebfb86de7871d4c6a60f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
@rxin About the name "map_aggregate", actually it was suggested by srinath, 
do you have a better name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15637
  
@rxin By "histograms" for numeric columns, do you mean equi-height 
histogram? The main purpose of this pr is to construct equi-width histogram 
without prior knowledge of ndv, so that we can compute it with other stats in 
the same scan. And equi-width histogram is important for both string and 
numeric(including timestamp and date) types, because it provides more accurate 
estimation than equi-height histogram.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15778: [SPARK-18283][Structured Streaming][Kafka] Added test to...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15778
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68169/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15778: [SPARK-18283][Structured Streaming][Kafka] Added test to...

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15778
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15778: [SPARK-18283][Structured Streaming][Kafka] Added test to...

2016-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15778
  
**[Test build #68169 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68169/consoleFull)**
 for PR 15778 at commit 
[`2f3b444`](https://github.com/apache/spark/commit/2f3b444d1af1aab8d0c96c99218e2dd1e432f416).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15771: [SPARK-18260] Make from_json null safe

2016-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15771
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >