date:20161110

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15843
  
**[Test build #68513 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68513/consoleFull)**
 for PR 15843 at commit 
[`3d858a2`](https://github.com/apache/spark/commit/3d858a2326809b7e1c679b712d84a8a21767d13c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15704
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15704
  
**[Test build #68512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68512/consoleFull)**
 for PR 15704 at commit 
[`a815df9`](https://github.com/apache/spark/commit/a815df9b9eb840d410565f13f89e899204cab341).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15704
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68512/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...

2016-11-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15704#discussion_r87549789
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -225,6 +226,102 @@ class HiveDDLSuite
 }
   }
 
+  test("SPARK-17732: Drop partitions by filter") {
+withTable("sales") {
+  sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, 
quarter STRING)")
+
+  for (country <- Seq("US", "CA", "KR")) {
+for (quarter <- 1 to 4) {
+  sql(s"ALTER TABLE sales ADD PARTITION (country='$country', 
quarter='$quarter')")
+}
+  }
+
+  sql("ALTER TABLE sales DROP PARTITION (country < 'KR', quarter > 
'2')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=CA/quarter=1") ::
+Row("country=CA/quarter=2") ::
+Row("country=KR/quarter=1") ::
+Row("country=KR/quarter=2") ::
+Row("country=KR/quarter=3") ::
+Row("country=KR/quarter=4") ::
+Row("country=US/quarter=1") ::
+Row("country=US/quarter=2") ::
+Row("country=US/quarter=3") ::
+Row("country=US/quarter=4") :: Nil)
+
+  sql("ALTER TABLE sales DROP PARTITION (country < 'KR'), PARTITION 
(quarter <= '1')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=KR/quarter=2") ::
+Row("country=KR/quarter=3") ::
+Row("country=KR/quarter=4") ::
+Row("country=US/quarter=2") ::
+Row("country=US/quarter=3") ::
+Row("country=US/quarter=4") :: Nil)
+
+  sql("ALTER TABLE sales DROP PARTITION (country='KR', quarter='4')")
+  sql("ALTER TABLE sales DROP PARTITION (country='US', quarter='3')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=KR/quarter=2") ::
+Row("country=KR/quarter=3") ::
+Row("country=US/quarter=2") ::
+Row("country=US/quarter=4") :: Nil)
+
+  sql("ALTER TABLE sales DROP PARTITION (quarter <= 2), PARTITION 
(quarter >= '4')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=KR/quarter=3") :: Nil)
+
+  val m = intercept[AnalysisException] {
+sql("ALTER TABLE sales DROP PARTITION (quarter <= 4), PARTITION 
(quarter <= '2')")
+  }.getMessage
+  // `PARTITION (quarter <= '2')` should raises exceptions because 
`PARTITION (quarter <= 4)`
+  // already removes all partitions.
--- End diff --

As we have discussed before, this behavior may not the same as ALTER TABLE 
DROP PARTITION with only equal to spec. Should we make them consistent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley looks good, merged ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15704
  
**[Test build #68512 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68512/consoleFull)**
 for PR 15704 at commit 
[`a815df9`](https://github.com/apache/spark/commit/a815df9b9eb840d410565f13f89e899204cab341).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15843
  
You're right!  It's another bug: copy should be implemented in JavaParams, 
not JavaModel.  I'm sending this PR to fix that: 
https://github.com/techaddict/spark/pull/1

Can you please check it out and merge it into your PR if it looks OK to 
you?  All pyspark.ml tests ran successfully with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...

2016-11-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15704#discussion_r87549043
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -226,6 +227,63 @@ class HiveDDLSuite
 }
   }
 
+  test("SPARK-17732: Drop partitions by filter") {
+withTable("sales") {
+  sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, 
quarter STRING)")
+
+  for (country <- Seq("US", "CA", "KR")) {
+for (quarter <- 1 to 4) {
+  sql(s"ALTER TABLE sales ADD PARTITION (country='$country', 
quarter='$quarter')")
+}
+  }
+
+  sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=KR/quarter=1") ::
+Row("country=KR/quarter=2") ::
+Row("country=KR/quarter=3") ::
+Row("country=KR/quarter=4") ::
+Row("country=US/quarter=1") ::
+Row("country=US/quarter=2") ::
+Row("country=US/quarter=3") ::
+Row("country=US/quarter=4") :: Nil)
+
+  sql("ALTER TABLE sales DROP PARTITION (quarter <= '2')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=KR/quarter=3") ::
+Row("country=KR/quarter=4") ::
+Row("country=US/quarter=3") ::
+Row("country=US/quarter=4") :: Nil)
+
+  sql("ALTER TABLE sales DROP PARTITION (country='KR', quarter='4')")
+  sql("ALTER TABLE sales DROP PARTITION (country='US', quarter='3')")
--- End diff --

I added the case by updating the existing testcases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15850: [SPARK-18411] [SQL] Add Argument Types and Test Cases fo...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15850
  
**[Test build #68511 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68511/consoleFull)**
 for PR 15850 at commit 
[`c02b10d`](https://github.com/apache/spark/commit/c02b10d21d3d1ebaddd93c58112e67dc7ef0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15850: [SPARK-18411] [SQL] Add Argument Types and Test C...

2016-11-10 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/15850

[SPARK-18411] [SQL] Add Argument Types and Test Cases for String Functions 
[WIP]

### What changes were proposed in this pull request?
Add argument types and test cases into the extended descriptions of string 
functions.

### How was this patch tested?
Added test cases to verify the added argument types.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark addArgument4StringExpressions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15850


commit c02b10d21d3d1ebaddd93c58112e67dc7ef0
Author: gatorsmile 
Date:   2016-11-11T07:29:16Z

fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...

2016-11-10 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/15593#discussion_r87547519
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -1486,57 +1504,65 @@ private class LogisticAggregator(
 var marginOfLabel = 0.0
 var maxMargin = Double.NegativeInfinity
 
-val margins = Array.tabulate(numClasses) { i =>
-  var margin = 0.0
-  features.foreachActive { (index, value) =>
-if (localFeaturesStd(index) != 0.0 && value != 0.0) {
-  margin += localCoefficients(i * numFeaturesPlusIntercept + 
index) *
-value / localFeaturesStd(index)
-}
+val margins = new Array[Double](numClasses)
+features.foreachActive { (index, value) =>
+  val stdValue = value / localFeaturesStd(index)
+  var j = 0
+  while (j < numClasses) {
+margins(j) += localCoefficients(index * numClasses + j) * stdValue
+j += 1
   }
-
+}
+var i = 0
+while (i < numClasses) {
   if (fitIntercept) {
-margin += localCoefficients(i * numFeaturesPlusIntercept + 
numFeatures)
+margins(i) += localCoefficients(numClasses * numFeatures + i)
   }
-  if (i == label.toInt) marginOfLabel = margin
-  if (margin > maxMargin) {
-maxMargin = margin
+  if (i == label.toInt) marginOfLabel = margins(i)
+  if (margins(i) > maxMargin) {
+maxMargin = margins(i)
   }
-  margin
+  i += 1
 }
 
 /**
  * When maxMargin > 0, the original formula could cause overflow.
  * We address this by subtracting maxMargin from all the margins, so 
it's guaranteed
  * that all of the new margins will be smaller than zero to prevent 
arithmetic overflow.
  */
+val multipliers = new Array[Double](numClasses)
 val sum = {
   var temp = 0.0
-  if (maxMargin > 0) {
-for (i <- 0 until numClasses) {
-  margins(i) -= maxMargin
-  temp += math.exp(margins(i))
-}
-  } else {
-for (i <- 0 until numClasses) {
-  temp += math.exp(margins(i))
-}
+  var i = 0
+  while (i < numClasses) {
+if (maxMargin > 0) margins(i) -= maxMargin
+val exp = math.exp(margins(i))
+temp += exp
+multipliers(i) = exp
+i += 1
   }
   temp
 }
 
-for (i <- 0 until numClasses) {
-  val multiplier = math.exp(margins(i)) / sum - {
-if (label == i) 1.0 else 0.0
-  }
-  features.foreachActive { (index, value) =>
-if (localFeaturesStd(index) != 0.0 && value != 0.0) {
-  localGradientArray(i * numFeaturesPlusIntercept + index) +=
-weight * multiplier * value / localFeaturesStd(index)
+margins.indices.foreach { i =>
+  multipliers(i) = multipliers(i) / sum - (if (label == i) 1.0 else 
0.0)
+}
+features.foreachActive { (index, value) =>
+  if (localFeaturesStd(index) != 0.0 && value != 0.0) {
+val stdValue = value / localFeaturesStd(index)
+var j = 0
+while (j < numClasses) {
+  localGradientArray(index * numClasses + j) +=
+weight * multipliers(j) * stdValue
+  j += 1
 }
   }
-  if (fitIntercept) {
-localGradientArray(i * numFeaturesPlusIntercept + numFeatures) += 
weight * multiplier
+}
+if (fitIntercept) {
+  var i = 0
+  while (i < numClasses) {
+localGradientArray(numFeatures * numClasses + i) += weight * 
multipliers(i)
+i += 1
   }
 }
 
--- End diff --

I'm not sure I fully get where you intend to use `foreachActive` over the 
gradient matrix? Maybe it's the location of this comment that is confusing me 
... 

... but here in `multinomialUpdateInPlace`, we are iterating over features 
using `foreachActive`, then for each feature iterating over `numClasses`. If we 
iterate over the gradient using `foreachActive` how will that work? Won't it be 
super inefficient? Perhaps I am missing something about what you intend, could 
you clarify with an example?





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additio

[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15849
  
**[Test build #68510 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68510/consoleFull)**
 for PR 15849 at commit 
[`82a4487`](https://github.com/apache/spark/commit/82a4487e5f7d4fe7f7a375cbdf86554882bcdf59).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15849
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68510/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15849
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15849
  
**[Test build #68510 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68510/consoleFull)**
 for PR 15849 at commit 
[`82a4487`](https://github.com/apache/spark/commit/82a4487e5f7d4fe7f7a375cbdf86554882bcdf59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15849: [SPARK-18410][STREAMING] Add structured kafka exa...

2016-11-10 Thread uncleGen

GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/15849

[SPARK-18410][STREAMING] Add structured kafka example

## What changes were proposed in this pull request?

This PR provides structured kafka wordcount examples

## How was this patch tested?




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-18410

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15849


commit 962370cd84aace15b17f8ac58d285a2840def3c4
Author: genmao.ygm 
Date:   2016-11-11T06:42:54Z

SPARK-18410: Add structured kafka example

commit 4f83a1f8559eb038b801b738fdfd90ce003acd92
Author: genmao.ygm 
Date:   2016-11-11T07:16:35Z

update

commit 82a4487e5f7d4fe7f7a375cbdf86554882bcdf59
Author: genmao.ygm 
Date:   2016-11-11T07:19:23Z

update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-11-10 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r87546426
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 
   protected def checkEvaluation(
   expression: => Expression, expected: Any, inputRow: InternalRow = 
EmptyRow): Unit = {
-val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
+// No codegen version expects GenericArrayData for array expect 
Binarytype
+val catalystValue = expected match {
+  case arr: Array[Byte] if expression.dataType == BinaryType => arr
+  case arr: Array[_] => new 
GenericArrayData(arr.map(CatalystTypeConverters.convertToCatalyst))
--- End diff --

I think that you are right. This workaround was for previous versions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...

2016-11-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15704#discussion_r87546041
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -226,6 +227,63 @@ class HiveDDLSuite
 }
   }
 
+  test("SPARK-17732: Drop partitions by filter") {
+withTable("sales") {
+  sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, 
quarter STRING)")
+
+  for (country <- Seq("US", "CA", "KR")) {
+for (quarter <- 1 to 4) {
+  sql(s"ALTER TABLE sales ADD PARTITION (country='$country', 
quarter='$quarter')")
+}
+  }
+
+  sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=KR/quarter=1") ::
+Row("country=KR/quarter=2") ::
+Row("country=KR/quarter=3") ::
+Row("country=KR/quarter=4") ::
+Row("country=US/quarter=1") ::
+Row("country=US/quarter=2") ::
+Row("country=US/quarter=3") ::
+Row("country=US/quarter=4") :: Nil)
+
+  sql("ALTER TABLE sales DROP PARTITION (quarter <= '2')")
+  checkAnswer(sql("SHOW PARTITIONS sales"),
+Row("country=KR/quarter=3") ::
+Row("country=KR/quarter=4") ::
+Row("country=US/quarter=3") ::
+Row("country=US/quarter=4") :: Nil)
+
+  sql("ALTER TABLE sales DROP PARTITION (country='KR', quarter='4')")
+  sql("ALTER TABLE sales DROP PARTITION (country='US', quarter='3')")
--- End diff --

To add that, we should make another testcases because the remaining 
partitions are not enough to test that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley yes I did it for `JavaWrapper` first, but try running tests with 
it gives 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68478/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15843
  
Thanks a lot for finding & reporting this!  The fix should probably go in 
JavaWrapper, not JavaModel, right?

I tested this manually (in JavaWrapper), and it seems to fix the 
problematic case with StringIndexer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15840
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15840
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68509/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15840
  
**[Test build #68509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68509/consoleFull)**
 for PR 15840 at commit 
[`deffccd`](https://github.com/apache/spark/commit/deffccdf2ef314acf3de94c4fbd33655e65e24e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15704
  
Thank you, @viirya ! I also feel like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15847: [SPARK-18387] [SQL] Add serialization to checkEva...

2016-11-10 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15847#discussion_r87541254
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1431,43 +1431,49 @@ case class FormatNumber(x: Expression, d: 
Expression)
 
   // Associated with the pattern, for the last d value, and we will update 
the
   // pattern (DecimalFormat) once the new coming d value differ with the 
last one.
+  // This is an Option to distinguish between 0 (numberFormat is valid) 
and uninitialized after
+  // serialization (numberFormat has not been updated for dValue = 0).
   @transient
-  private var lastDValue: Int = -100
+  private var lastDValue: Option[Int] = None
--- End diff --

Actually you can do this via a lazy val too, which is just

```
@transient private lazy var lastDValue: Int = -100
```

then I believe it gets initialized to -100 after deserialization 
automatically.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-11-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r87538921
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 
   protected def checkEvaluation(
   expression: => Expression, expected: Any, inputRow: InternalRow = 
EmptyRow): Unit = {
-val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
+// No codegen version expects GenericArrayData for array expect 
Binarytype
+val catalystValue = expected match {
+  case arr: Array[Byte] if expression.dataType == BinaryType => arr
+  case arr: Array[_] => new 
GenericArrayData(arr.map(CatalystTypeConverters.convertToCatalyst))
--- End diff --

I think they are the same here, isn't?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15704
  
The changes to parsing looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15790: [SPARK-18264][SPARKR] build vignettes with package, upda...

2016-11-10 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15790
  
So one proposal I was thinking of is to just check in a built version of 
the vignette in to the source tree. That way the release packaging wouldn't 
need to change. The only thing to keep in mind is that whenever we update the 
vignette we will need to rebuild it. Thoughts ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13706
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68506/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13706
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13706
  
**[Test build #68506 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68506/consoleFull)**
 for PR 13706 at commit 
[`e895a9c`](https://github.com/apache/spark/commit/e895a9c7b89d2a53f6747f1e7fa08f8e97b80ed4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...

2016-11-10 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/15840
  
@hvanhovell @kiszk I tried to use `CodegenContext.nullSafeExec()` in 
`MapObjects` as an example.
If you can bear with this for now, I'll apply it to other places to 
generate nullability checking codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15840: [SPARK-18398][SQL] Fix nullabilities of MapObjects and o...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15840
  
**[Test build #68509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68509/consoleFull)**
 for PR 15840 at commit 
[`deffccd`](https://github.com/apache/spark/commit/deffccdf2ef314acf3de94c4fbd33655e65e24e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13706
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13706
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68505/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13706
  
**[Test build #68505 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68505/consoleFull)**
 for PR 13706 at commit 
[`fb8b57a`](https://github.com/apache/spark/commit/fb8b57a4d46f6856dc2c883c6e995c248dda6a3b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68507/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #68507 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68507/consoleFull)**
 for PR 15172 at commit 
[`6863efe`](https://github.com/apache/spark/commit/6863efe77118f91c0f849d34d4698dad608213b1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15704
  
Thank you, @hvanhovell .
Now, this PR becomes much concise due to your advice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68504/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15702
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15702
  
**[Test build #68504 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68504/consoleFull)**
 for PR 15702 at commit 
[`de601bb`](https://github.com/apache/spark/commit/de601bb4fdb9e5a45bddefa31de38bbb7fc2570f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-10 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15659
  
Also like if there is any testing or anything I can do or coordinate to get 
done that would help y'all feel comfortable with this please let me know :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15704
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68503/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15704
  
**[Test build #68503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68503/consoleFull)**
 for PR 15704 at commit 
[`c9e7c06`](https://github.com/apache/spark/commit/c9e7c069b9a5c429a2dd73d3f542ddd045e3b876).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15742: [SPARK-16808][Core] History Server main page does not ho...

2016-11-10 Thread mariobriggs

Github user mariobriggs commented on the issue:

https://github.com/apache/spark/pull/15742
  
ok. will look into that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15820
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15820
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68508/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15820
  
**[Test build #68508 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68508/consoleFull)**
 for PR 15820 at commit 
[`3aa9d7e`](https://github.com/apache/spark/commit/3aa9d7e6ebbf4b0362f9ce58d97012dd5be96bce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15847: [SPARK-18387] [SQL] Add serialization to checkEvaluation...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68502/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15847: [SPARK-18387] [SQL] Add serialization to checkEvaluation...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15847
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15847: [SPARK-18387] [SQL] Add serialization to checkEvaluation...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15847
  
**[Test build #68502 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68502/consoleFull)**
 for PR 15847 at commit 
[`8e829ae`](https://github.com/apache/spark/commit/8e829ae87b197de2ff4b8777202a47d5f1204c56).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...

2016-11-10 Thread skanjila

Github user skanjila commented on the issue:

https://github.com/apache/spark/pull/15848
  
Also I really would love to avoid having to rebase yet again due to more 
needed commits :) to this pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...

2016-11-10 Thread skanjila

Github user skanjila commented on the issue:

https://github.com/apache/spark/pull/15848
  
Yes to your first comment on PageViewStream, for the other one 
TestSQLContext it was unfortunately missed, I'lol add that with the next pull 
request which will also contain the code fixes to the Python unit tests, is it 
ok to merge this one without the things you mention.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15800: [SPARK-18334] MinHash should use binary hash dist...

2016-11-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15800#discussion_r87528327
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -76,7 +72,19 @@ class MinHashModel private[ml] (
   @Since("2.1.0")
   override protected[ml] def hashDistance(x: Vector, y: Vector): Double = {
 // Since it's generated by hashing, it will be a pair of dense vectors.
-x.toDense.values.zip(y.toDense.values).map(pair => math.abs(pair._1 - 
pair._2)).min
+if (x.toDense.values.zip(y.toDense.values).exists(pair => pair._1 == 
pair._2)) {
--- End diff --

I think I do more agree on the comment from @jkbradley at 
https://github.com/apache/spark/pull/15800#issuecomment-259298082, if I 
understand correctly some terms here.

Is the indicator meaning a matching hashing value between two vectors from 
one hashing function, i.e., h_i?
If this understanding is correct, I think averaging indicators should be 
the right way to compute MinHash's hash distance.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-10 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87527804
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,129 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  /**
+   * Get the record at the `offset`. If it doesn't exist, try to get the 
earliest record in
+   * `[offset, untilOffset)`.
+   */
+  def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+require(offset < untilOffset, s"offset: $offset, untilOffset: 
$untilOffset")
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+try {
+  if (offset != nextOffsetInFetchedData) {
+logInfo(s"Initial fetch for $topicPartition $offset")
+seek(offset)
+poll(pollTimeoutMs)
+  } else if (!fetchedData.hasNext()) {
+// The last pre-fetched data has been drained.
+poll(pollTimeoutMs)
+  }
+  getRecordFromFetchedData(offset, untilOffset)
+} catch {
+  case e: OffsetOutOfRangeException =>
+logWarning(s"Cannot fetch offset $offset, try to recover from the 
beginning offset", e)
+advanceToBeginningOffsetAndFetch(offset, untilOffset, 
pollTimeoutMs)
+}
+  }
+
+  /**
+   * Try to advance to the beginning offset and fetch again. 
`beginningOffset` should be in
+   * `[offset, untilOffset]`. If not, it will try to fetch `offset` again 
if it's in
+   * `[beginningOffset, latestOffset)`. Otherwise, it will return null and 
reset the pre-fetched
+   * data.
+   */
+  private def advanceToBeginningOffsetAndFetch(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+val beginningOffset = getBeginningOffset()
+if (beginningOffset <= offset) {
+  val latestOffset = getLatestOffset()
+  if (latestOffset <= offset) {
+// beginningOffset <= latestOffset - 1 < offset <= untilOffset - 1
+logWarning(s"Offset ${offset} is later than the latest offset 
$latestOffset. " +
+  s"Skipped [$offset, $untilOffset)")
+reset()
+null
+  } else {
+// beginningOffset <= offset <= min(latestOffset - 1, untilOffset 
- 1)
+getAndIgnoreLostData(offset, untilOffset, pollTimeoutMs)
+  }
+} else {
+  if (beginningOffset >= untilOffset) {
+// offset <= untilOffset - 1 < beginningOffset
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$untilOffset)")
+reset()
+null
+  } else {
+// offset < beginningOffset <= untilOffset - 1
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$beginningOffset)")
+getAndIgnoreLostData(beginningOffset, untilOffset, pollTimeoutMs)
+  }
+}
+  }
+
+  /**
+   * Get the earliest record in [offset, untilOffset) from the fetched 
data. If there is no such
+   * record, returns null. Must be called after `poll`.
+   */
+  private def getRecordFromFetchedData(
+  offset: Long,
+  untilOffset: Long): Co

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15820
  
**[Test build #68508 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68508/consoleFull)**
 for PR 15820 at commit 
[`3aa9d7e`](https://github.com/apache/spark/commit/3aa9d7e6ebbf4b0362f9ce58d97012dd5be96bce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
cc: @jkbradley @davies @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for ...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15817
  
cc: @sethah @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...

2016-11-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15848
  
Just to be clear, from the discussion in the JIRA, it seems 
`PageViewStream` example is missed intendedly here for now because changing 
from `local[2]` to `local[4]` is failed for an unknown reason?

And.. it seems 

```
./sql/core/src/test/scala/org/apache/spark/sql/test/TestSQLContext.scala:   
 this(new SparkContext("local[2]", "test-sql-context",
```
 is missed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-11-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r87525792
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 
   protected def checkEvaluation(
   expression: => Expression, expected: Any, inputRow: InternalRow = 
EmptyRow): Unit = {
-val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
+// No codegen version expects GenericArrayData for array expect 
Binarytype
+val catalystValue = expected match {
+  case arr: Array[Byte] if expression.dataType == BinaryType => arr
--- End diff --

I don't see `convertToCatalyst` do special case for `BinaryType` before. 
Why we need to do that now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-11-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r87525740
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -43,11 +43,38 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 
   protected def checkEvaluation(
   expression: => Expression, expected: Any, inputRow: InternalRow = 
EmptyRow): Unit = {
-val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
+// No codegen version expects GenericArrayData for array expect 
Binarytype
+val catalystValue = expected match {
+  case arr: Array[Byte] if expression.dataType == BinaryType => arr
+  case arr: Array[_] => new 
GenericArrayData(arr.map(CatalystTypeConverters.convertToCatalyst))
--- End diff --

Doesn't this line do the same what 
`CatalystTypeConverters.convertToCatalyst` does?
I don't see `convertToCatalyst` do special case for `BinaryType` before. 
Why we need to do that now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #68507 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68507/consoleFull)**
 for PR 15172 at commit 
[`6863efe`](https://github.com/apache/spark/commit/6863efe77118f91c0f849d34d4698dad608213b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/15172
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68499/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #68499 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68499/consoleFull)**
 for PR 15172 at commit 
[`6863efe`](https://github.com/apache/spark/commit/6863efe77118f91c0f849d34d4698dad608213b1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15742: [SPARK-16808][Core] History Server main page does not ho...

2016-11-10 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/15742
  
It doesn't merge cleanly into branch-2.0. You can file a separate PR if you 
want it fixed there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15742: [SPARK-16808][Core] History Server main page does not ho...

2016-11-10 Thread mariobriggs

Github user mariobriggs commented on the issue:

https://github.com/apache/spark/pull/15742
  
@vanzin since this a regression bux in 2.0, any particular reason it is 
merged only to 2.1 . I believe this should be in 2.0.2/3 as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13706
  
**[Test build #68506 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68506/consoleFull)**
 for PR 13706 at commit 
[`e895a9c`](https://github.com/apache/spark/commit/e895a9c7b89d2a53f6747f1e7fa08f8e97b80ed4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15652: [SPARK-16987] [None] Add spark-default.conf prope...

2016-11-10 Thread hayashidac

Github user hayashidac commented on a diff in the pull request:

https://github.com/apache/spark/pull/15652#discussion_r87522936
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -85,8 +85,10 @@ private[ml] trait DecisionTreeParams extends 
PredictorParams
* (default = 256 MB)
* @group expertParam
*/
-  final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB",
-"Maximum memory in MB allocated to histogram aggregation.",
+  final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB", 
"Maximum memory in MB" +
--- End diff --

I rebased unrelated changes. Please confirm it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15790: [SPARK-18264][SPARKR] build vignettes with package, upda...

2016-11-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15790
  
I think 
https://github.com/apache/spark/blob/master/dev/make-distribution.sh should 
change too but I'm not 100% how the R package is built there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13706
  
**[Test build #68505 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68505/consoleFull)**
 for PR 13706 at commit 
[`fb8b57a`](https://github.com/apache/spark/commit/fb8b57a4d46f6856dc2c883c6e995c248dda6a3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15846: [CORE][Minor]:remove unused import in SparkContext.scala

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15846
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15846: [CORE][Minor]:remove unused import in SparkContext.scala

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15846
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68494/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15846: [CORE][Minor]:remove unused import in SparkContext.scala

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15846
  
**[Test build #68494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68494/consoleFull)**
 for PR 15846 at commit 
[`f14fba5`](https://github.com/apache/spark/commit/f14fba5bd676f48ab4936a8181f48253b7cbfc40).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-10 Thread lianhuiwang

Github user lianhuiwang commented on the issue:

https://github.com/apache/spark/pull/13706
  
@hvanhovell I have updated this PR. Can you take a look? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11122
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68501/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11122
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11122: [SPARK-13027][STREAMING] Added batch time as a parameter...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11122
  
**[Test build #68501 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68501/consoleFull)**
 for PR 11122 at commit 
[`fe68b6c`](https://github.com/apache/spark/commit/fe68b6c03300a37799ccaad2ee554bde005c8f6f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15842: [SPARK-18401][SPARKR][ML] SparkR random forest sh...

2016-11-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15842


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14612
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14612
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68495/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15842: [SPARK-18401][SPARKR][ML] SparkR random forest should su...

2016-11-10 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15842
  
Merged into master and branch-2.1. Thanks for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14612
  
**[Test build #68495 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68495/consoleFull)**
 for PR 14612 at commit 
[`3035cfe`](https://github.com/apache/spark/commit/3035cfe6c05eb74c36412c833c89864bc2126a63).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15814


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15702
  
**[Test build #68504 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68504/consoleFull)**
 for PR 15702 at commit 
[`de601bb`](https://github.com/apache/spark/commit/de601bb4fdb9e5a45bddefa31de38bbb7fc2570f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-10 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15814
  
Merging in master/branch-2.1. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15847: [SPARK-18387] [SQL] Add serialization to checkEva...

2016-11-10 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15847#discussion_r87519874
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param name The short name of the function
  */
 abstract class LeafMathExpression(c: Double, name: String)
-  extends LeafExpression with CodegenFallback {
+  extends LeafExpression with CodegenFallback with Serializable {
--- End diff --

does this actually matter? all expressions should be case classes, which 
means they are serializable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15847: [SPARK-18387] [SQL] Add serialization to checkEva...

2016-11-10 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15847#discussion_r87519832
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1431,43 +1431,49 @@ case class FormatNumber(x: Expression, d: 
Expression)
 
   // Associated with the pattern, for the last d value, and we will update 
the
   // pattern (DecimalFormat) once the new coming d value differ with the 
last one.
+  // This is an Option to distinguish between 0 (numberFormat is valid) 
and uninitialized after
+  // serialization (numberFormat has not been updated for dValue = 0).
   @transient
-  private var lastDValue: Int = -100
+  private var lastDValue: Option[Int] = None
--- End diff --

any perf penalty to do it this way?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread marmbrus

Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15702
  
jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15702
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68496/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15702
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15702
  
**[Test build #68496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68496/consoleFull)**
 for PR 15702 at commit 
[`de601bb`](https://github.com/apache/spark/commit/de601bb4fdb9e5a45bddefa31de38bbb7fc2570f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15704
  
**[Test build #68503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68503/consoleFull)**
 for PR 15704 at commit 
[`c9e7c06`](https://github.com/apache/spark/commit/c9e7c069b9a5c429a2dd73d3f542ddd045e3b876).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...

2016-11-10 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15593#discussion_r87518789
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -1486,57 +1504,65 @@ private class LogisticAggregator(
 var marginOfLabel = 0.0
 var maxMargin = Double.NegativeInfinity
 
-val margins = Array.tabulate(numClasses) { i =>
-  var margin = 0.0
-  features.foreachActive { (index, value) =>
-if (localFeaturesStd(index) != 0.0 && value != 0.0) {
-  margin += localCoefficients(i * numFeaturesPlusIntercept + 
index) *
-value / localFeaturesStd(index)
-}
+val margins = new Array[Double](numClasses)
+features.foreachActive { (index, value) =>
+  val stdValue = value / localFeaturesStd(index)
+  var j = 0
+  while (j < numClasses) {
+margins(j) += localCoefficients(index * numClasses + j) * stdValue
+j += 1
   }
-
+}
+var i = 0
+while (i < numClasses) {
   if (fitIntercept) {
-margin += localCoefficients(i * numFeaturesPlusIntercept + 
numFeatures)
+margins(i) += localCoefficients(numClasses * numFeatures + i)
   }
-  if (i == label.toInt) marginOfLabel = margin
-  if (margin > maxMargin) {
-maxMargin = margin
+  if (i == label.toInt) marginOfLabel = margins(i)
+  if (margins(i) > maxMargin) {
+maxMargin = margins(i)
   }
-  margin
+  i += 1
 }
 
 /**
  * When maxMargin > 0, the original formula could cause overflow.
  * We address this by subtracting maxMargin from all the margins, so 
it's guaranteed
  * that all of the new margins will be smaller than zero to prevent 
arithmetic overflow.
  */
+val multipliers = new Array[Double](numClasses)
 val sum = {
   var temp = 0.0
-  if (maxMargin > 0) {
-for (i <- 0 until numClasses) {
-  margins(i) -= maxMargin
-  temp += math.exp(margins(i))
-}
-  } else {
-for (i <- 0 until numClasses) {
-  temp += math.exp(margins(i))
-}
+  var i = 0
+  while (i < numClasses) {
+if (maxMargin > 0) margins(i) -= maxMargin
+val exp = math.exp(margins(i))
+temp += exp
+multipliers(i) = exp
+i += 1
   }
   temp
 }
 
-for (i <- 0 until numClasses) {
-  val multiplier = math.exp(margins(i)) / sum - {
-if (label == i) 1.0 else 0.0
-  }
-  features.foreachActive { (index, value) =>
-if (localFeaturesStd(index) != 0.0 && value != 0.0) {
-  localGradientArray(i * numFeaturesPlusIntercept + index) +=
-weight * multiplier * value / localFeaturesStd(index)
+margins.indices.foreach { i =>
+  multipliers(i) = multipliers(i) / sum - (if (label == i) 1.0 else 
0.0)
+}
+features.foreachActive { (index, value) =>
+  if (localFeaturesStd(index) != 0.0 && value != 0.0) {
+val stdValue = value / localFeaturesStd(index)
+var j = 0
+while (j < numClasses) {
+  localGradientArray(index * numClasses + j) +=
+weight * multipliers(j) * stdValue
+  j += 1
 }
   }
-  if (fitIntercept) {
-localGradientArray(i * numFeaturesPlusIntercept + numFeatures) += 
weight * multiplier
+}
+if (fitIntercept) {
+  var i = 0
+  while (i < numClasses) {
+localGradientArray(numFeatures * numClasses + i) += weight * 
multipliers(i)
+i += 1
   }
 }
 
--- End diff --

You can make `def gradient: Vector` returning `Matrix`, and for MLOR, the 
implementation can be col major matrix so when we use `foreachActive` we don't 
need to worry about the underlying implementation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15848: [SPARK-9487v2] Use the same num. worker threads in Scala...

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15848
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 494 matches

Mail list logo