[GitHub] spark pull request: [SPARK-12616] [SQL] Adding a New Logical Opera...

2016-01-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10577#discussion_r48820132
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -595,6 +598,22 @@ object BooleanSimplification extends Rule[LogicalPlan] 
with PredicateHelper {
 }
 
 /**
+ * Combines all adjacent [[Union]] and [[Unions]] operators into a single 
[[Unions]].
+ */
+object CombineUnions extends Rule[LogicalPlan] {
+  private def collectUnionChildren(plan: LogicalPlan): Seq[LogicalPlan] = 
plan match {
+case Union(l, r) => collectUnionChildren(l) ++ collectUnionChildren(r)
--- End diff --

Another option would just be to do this at construction time, that way we 
can avoid paying the cost in the analyzer.  This would still limit the cases we 
could cache (i.e. we'd miss cached data unioned with other data), but that 
doesn't seem like a huge deal.

I'd leave this rule here either way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7675][ML][PYSpark] sparkml params type ...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9581#issuecomment-168930029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48746/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11878][SQL]: Eliminate distribute by in...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9858#issuecomment-168932871
  
**[Test build #48751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48751/consoleFull)**
 for PR 9858 at commit 
[`dd2bdc8`](https://github.com/apache/spark/commit/dd2bdc8650e9db763ec3afe290919d8a15404e9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12570] [ML] [Doc] DecisionTreeRegressor...

2016-01-05 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/10594

[SPARK-12570] [ML] [Doc] DecisionTreeRegressor: provide variance of 
prediction: user guide update



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-12570

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10594


commit 94273d1512eded0148d02b7a76925ee4a40d8039
Author: Yanbo Liang 
Date:   2016-01-05T08:21:33Z

DecisionTreeRegressor: provide variance of prediction: user guide update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10583#issuecomment-168932606
  
**[Test build #48747 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48747/consoleFull)**
 for PR 10583 at commit 
[`fb3b4a4`](https://github.com/apache/spark/commit/fb3b4a4c461391866bc12a51dd1e60eadeaff916).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...

2016-01-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10498#discussion_r48821710
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -240,6 +241,23 @@ private[hive] class HiveMetastoreCatalog(val client: 
ClientInterface, hive: Hive
   }
 }
 
+if (userSpecifiedSchema.isDefined && bucketSpec.isDefined) {
+  val BucketSpec(numBuckets, bucketColumns, sortColumns) = 
bucketSpec.get
+
+  tableProperties.put("spark.sql.sources.schema.numBuckets", 
numBuckets.toString)
+  tableProperties.put("spark.sql.sources.schema.numBucketCols", 
bucketColumns.length.toString)
+  bucketColumns.zipWithIndex.foreach { case (bucketCol, index) =>
+tableProperties.put(s"spark.sql.sources.schema.bucketCol.$index", 
bucketCol)
+  }
+
+  if (sortColumns.isDefined) {
+tableProperties.put("spark.sql.sources.schema.numSortCols", 
sortColumns.get.length.toString)
--- End diff --

are we worried about the 4k limit and as a result want to limit the size of 
each property?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10595#issuecomment-168933998
  
Can you combine your pull requests into a single one?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...

2016-01-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10498#discussion_r48821401
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -240,6 +241,23 @@ private[hive] class HiveMetastoreCatalog(val client: 
ClientInterface, hive: Hive
   }
 }
 
+if (userSpecifiedSchema.isDefined && bucketSpec.isDefined) {
+  val BucketSpec(numBuckets, bucketColumns, sortColumns) = 
bucketSpec.get
+
+  tableProperties.put("spark.sql.sources.schema.numBuckets", 
numBuckets.toString)
+  tableProperties.put("spark.sql.sources.schema.numBucketCols", 
bucketColumns.length.toString)
+  bucketColumns.zipWithIndex.foreach { case (bucketCol, index) =>
+tableProperties.put(s"spark.sql.sources.schema.bucketCol.$index", 
bucketCol)
+  }
+
+  if (sortColumns.isDefined) {
+tableProperties.put("spark.sql.sources.schema.numSortCols", 
sortColumns.get.length.toString)
--- End diff --

It's only used to read the sorting columns back, which is the same 
technology we used to store partitioned columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/10595

[STREAMING][MINOR] More contextual information in logs + minor code i…

…mprovements

Please review and merge at your convenience. Thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark streaming-minor-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10595


commit 62129336a171479b37edc347255a7be226fd2d22
Author: Jacek Laskowski 
Date:   2016-01-05T08:25:00Z

[STREAMING][MINOR] More contextual information in logs + minor code 
improvements




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11878][SQL]: Eliminate distribute by in...

2016-01-05 Thread saucam
Github user saucam commented on the pull request:

https://github.com/apache/spark/pull/9858#issuecomment-168934153
  
Fixed



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...

2016-01-05 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10583#issuecomment-168929936
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7675][ML][PYSpark] sparkml params type ...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9581#issuecomment-168930028
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7675][ML][PYSpark] sparkml params type ...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9581#issuecomment-168929886
  
**[Test build #48746 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48746/consoleFull)**
 for PR 9581 at commit 
[`954f7c6`](https://github.com/apache/spark/commit/954f7c68f1e38aa80f33994f588c1bceb47679e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...

2016-01-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10498#discussion_r48820170
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -189,13 +220,43 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
 ifNotExists = false)).toRdd
   }
 
-  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { parCols =>
-parCols.map { col =>
-  df.logicalPlan.output
-.map(_.name)
-.find(df.sqlContext.analyzer.resolver(_, col))
-.getOrElse(throw new AnalysisException(s"Partition column $col not 
found in existing " +
-  s"columns (${df.logicalPlan.output.map(_.name).mkString(", 
")})"))
+  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { cols =>
+cols.map(normalize(_, "Partition"))
+  }
+
+  private def normalizedBucketCols: Option[Seq[String]] = 
bucketingColumns.map { cols =>
+cols.map(normalize(_, "Bucketing"))
+  }
+
+  private def normalizedSortCols: Option[Seq[String]] = sortingColumns.map 
{ cols =>
+cols.map(normalize(_, "Sorting"))
+  }
+
+  private def getBucketSpec: Option[BucketSpec] = {
+if (sortingColumns.isDefined) {
+  require(numBuckets.isDefined, "sortBy must be used together with 
bucketBy")
+}
+
+for {
+  n <- numBuckets
+  cols <- normalizedBucketCols
+} yield {
+  require(n > 0, "Bucket number must be greater than 0.")
+  BucketSpec(n, cols, normalizedSortCols)
+}
+  }
+
+  private def normalize(columnName: String, columnType: String): String = {
+val validColumnNames = df.logicalPlan.output.map(_.name)
+validColumnNames.find(df.sqlContext.analyzer.resolver(_, columnName))
+  .getOrElse(throw new AnalysisException(s"$columnType column 
$columnName not found in " +
+s"existing columns (${validColumnNames.mkString(", ")})"))
+  }
+
+  private def assertNotBucketed(): Unit = {
+if (numBuckets.isDefined || sortingColumns.isDefined) {
--- End diff --

I think sorting columns make no sense without bucketing columns, cc @nongli 
@yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11944][PYSPARK][MLLIB] python mllib.clu...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10150#issuecomment-168931946
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48742/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-168931880
  
**[Test build #48748 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48748/consoleFull)**
 for PR 10527 at commit 
[`0558bf8`](https://github.com/apache/spark/commit/0558bf8b698e9de7e19625627e487bfb3f33072d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11944][PYSPARK][MLLIB] python mllib.clu...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10150#issuecomment-168931761
  
**[Test build #48742 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48742/consoleFull)**
 for PR 10150 at commit 
[`0310efe`](https://github.com/apache/spark/commit/0310efeec1a202733b40a50085178ec1b1d97409).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10583#issuecomment-168931844
  
**[Test build #2322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2322/consoleFull)**
 for PR 10583 at commit 
[`fb3b4a4`](https://github.com/apache/spark/commit/fb3b4a4c461391866bc12a51dd1e60eadeaff916).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12581][SQL] Support case-sensitive tabl...

2016-01-05 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10523#issuecomment-168932900
  
@liancheng @yhuai Could you review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11944][PYSPARK][MLLIB] python mllib.clu...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10150#issuecomment-168931943
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12640][SQL] Add simple benchmarking uti...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10589#issuecomment-168932322
  
**[Test build #48738 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48738/consoleFull)**
 for PR 10589 at commit 
[`22afd1f`](https://github.com/apache/spark/commit/22afd1f0115b86cdb5ba661dd2c0714ff6a4243b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Benchmark(name: String, valuesPerIteration: Long, iters: Int = 
5) `
  * `  case class Case(name: String, fn: Int => Unit)`
  * `  case class Result(avgMs: Double, avgRate: Double)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...

2016-01-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10498#discussion_r48820931
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -189,13 +220,43 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
 ifNotExists = false)).toRdd
   }
 
-  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { parCols =>
-parCols.map { col =>
-  df.logicalPlan.output
-.map(_.name)
-.find(df.sqlContext.analyzer.resolver(_, col))
-.getOrElse(throw new AnalysisException(s"Partition column $col not 
found in existing " +
-  s"columns (${df.logicalPlan.output.map(_.name).mkString(", 
")})"))
+  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { cols =>
+cols.map(normalize(_, "Partition"))
+  }
+
+  private def normalizedBucketCols: Option[Seq[String]] = 
bucketingColumns.map { cols =>
+cols.map(normalize(_, "Bucketing"))
+  }
+
+  private def normalizedSortCols: Option[Seq[String]] = sortingColumns.map 
{ cols =>
+cols.map(normalize(_, "Sorting"))
+  }
+
+  private def getBucketSpec: Option[BucketSpec] = {
+if (sortingColumns.isDefined) {
+  require(numBuckets.isDefined, "sortBy must be used together with 
bucketBy")
+}
+
+for {
+  n <- numBuckets
+  cols <- normalizedBucketCols
+} yield {
+  require(n > 0, "Bucket number must be greater than 0.")
+  BucketSpec(n, cols, normalizedSortCols)
+}
+  }
+
+  private def normalize(columnName: String, columnType: String): String = {
+val validColumnNames = df.logicalPlan.output.map(_.name)
+validColumnNames.find(df.sqlContext.analyzer.resolver(_, columnName))
+  .getOrElse(throw new AnalysisException(s"$columnType column 
$columnName not found in " +
+s"existing columns (${validColumnNames.mkString(", ")})"))
+  }
+
+  private def assertNotBucketed(): Unit = {
+if (numBuckets.isDefined || sortingColumns.isDefined) {
--- End diff --

Your point makes sense if you look at it from the implementation's 
perspective, but if I'm an user, why do I have to call bucketBy in order to use 
sortBy?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12616] [SQL] Adding a New Logical Opera...

2016-01-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10577#discussion_r48820999
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -595,6 +598,22 @@ object BooleanSimplification extends Rule[LogicalPlan] 
with PredicateHelper {
 }
 
 /**
+ * Combines all adjacent [[Union]] and [[Unions]] operators into a single 
[[Unions]].
+ */
+object CombineUnions extends Rule[LogicalPlan] {
+  private def collectUnionChildren(plan: LogicalPlan): Seq[LogicalPlan] = 
plan match {
+case Union(l, r) => collectUnionChildren(l) ++ collectUnionChildren(r)
--- End diff --

To do this at construction time, we need to introduce a new Dataframe API 
`unionAll` that can combine more than two Dataframes? @marmbrus @rxin 

Is my understanding correct? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12393] [SparkR] Add read.text and write...

2016-01-05 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/10348#issuecomment-168933554
  
ping @shivaram @sun-rui @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...

2016-01-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10498#discussion_r48820313
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -189,13 +220,43 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
 ifNotExists = false)).toRdd
   }
 
-  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { parCols =>
-parCols.map { col =>
-  df.logicalPlan.output
-.map(_.name)
-.find(df.sqlContext.analyzer.resolver(_, col))
-.getOrElse(throw new AnalysisException(s"Partition column $col not 
found in existing " +
-  s"columns (${df.logicalPlan.output.map(_.name).mkString(", 
")})"))
+  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { cols =>
+cols.map(normalize(_, "Partition"))
+  }
+
+  private def normalizedBucketCols: Option[Seq[String]] = 
bucketingColumns.map { cols =>
+cols.map(normalize(_, "Bucketing"))
+  }
+
+  private def normalizedSortCols: Option[Seq[String]] = sortingColumns.map 
{ cols =>
+cols.map(normalize(_, "Sorting"))
+  }
+
+  private def getBucketSpec: Option[BucketSpec] = {
+if (sortingColumns.isDefined) {
+  require(numBuckets.isDefined, "sortBy must be used together with 
bucketBy")
+}
+
+for {
+  n <- numBuckets
+  cols <- normalizedBucketCols
+} yield {
+  require(n > 0, "Bucket number must be greater than 0.")
+  BucketSpec(n, cols, normalizedSortCols)
+}
+  }
+
+  private def normalize(columnName: String, columnType: String): String = {
+val validColumnNames = df.logicalPlan.output.map(_.name)
+validColumnNames.find(df.sqlContext.analyzer.resolver(_, columnName))
+  .getOrElse(throw new AnalysisException(s"$columnType column 
$columnName not found in " +
+s"existing columns (${validColumnNames.mkString(", ")})"))
+  }
+
+  private def assertNotBucketed(): Unit = {
+if (numBuckets.isDefined || sortingColumns.isDefined) {
--- End diff --

isn't it the same as our normal DataFrame.sort? It still increases 
compression ratio for Parquet.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12616] [SQL] Adding a New Logical Opera...

2016-01-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10577#discussion_r48820341
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -595,6 +598,22 @@ object BooleanSimplification extends Rule[LogicalPlan] 
with PredicateHelper {
 }
 
 /**
+ * Combines all adjacent [[Union]] and [[Unions]] operators into a single 
[[Unions]].
+ */
+object CombineUnions extends Rule[LogicalPlan] {
+  private def collectUnionChildren(plan: LogicalPlan): Seq[LogicalPlan] = 
plan match {
+case Union(l, r) => collectUnionChildren(l) ++ collectUnionChildren(r)
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...

2016-01-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10498#discussion_r48820679
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -189,13 +220,43 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
 ifNotExists = false)).toRdd
   }
 
-  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { parCols =>
-parCols.map { col =>
-  df.logicalPlan.output
-.map(_.name)
-.find(df.sqlContext.analyzer.resolver(_, col))
-.getOrElse(throw new AnalysisException(s"Partition column $col not 
found in existing " +
-  s"columns (${df.logicalPlan.output.map(_.name).mkString(", 
")})"))
+  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { cols =>
+cols.map(normalize(_, "Partition"))
+  }
+
+  private def normalizedBucketCols: Option[Seq[String]] = 
bucketingColumns.map { cols =>
+cols.map(normalize(_, "Bucketing"))
+  }
+
+  private def normalizedSortCols: Option[Seq[String]] = sortingColumns.map 
{ cols =>
+cols.map(normalize(_, "Sorting"))
+  }
+
+  private def getBucketSpec: Option[BucketSpec] = {
+if (sortingColumns.isDefined) {
+  require(numBuckets.isDefined, "sortBy must be used together with 
bucketBy")
+}
+
+for {
+  n <- numBuckets
+  cols <- normalizedBucketCols
+} yield {
+  require(n > 0, "Bucket number must be greater than 0.")
+  BucketSpec(n, cols, normalizedSortCols)
+}
+  }
+
+  private def normalize(columnName: String, columnType: String): String = {
+val validColumnNames = df.logicalPlan.output.map(_.name)
+validColumnNames.find(df.sqlContext.analyzer.resolver(_, columnName))
+  .getOrElse(throw new AnalysisException(s"$columnType column 
$columnName not found in " +
+s"existing columns (${validColumnNames.mkString(", ")})"))
+  }
+
+  private def assertNotBucketed(): Unit = {
+if (numBuckets.isDefined || sortingColumns.isDefined) {
--- End diff --

If users just wanna sort the data, they can call `DataFrame.sort` before 
write. In this context, the `sortingColumns` is part of the bucketing 
information and should be used together with `bucketingColumns`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][follow-up] use a single column v...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10588#issuecomment-168931722
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48741/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][follow-up] use a single column v...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10588#issuecomment-168931720
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][follow-up] use a single column v...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10588#issuecomment-168931526
  
**[Test build #48741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48741/consoleFull)**
 for PR 10588 at commit 
[`b652b45`](https://github.com/apache/spark/commit/b652b4548fb2b9270f7ebd11397fdbc09a89f583).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...

2016-01-05 Thread nongli
GitHub user nongli opened a pull request:

https://github.com/apache/spark/pull/10593

[SPARK-12644][SQL] Update parquet reader to be vectorized.

This inlines a few of the Parquet decoders and adds vectorized APIs to 
support decoding in batch.
There are a few particulars in the Parquet encodings that make this much 
more efficient. In
particular, RLE encodings are very well suited for batch decoding. The 
Parquet 2.0 encodings are
also very suited for this.

This is a work in progress and does not affect the current execution. In 
subsequent patches, we will
support more encodings and types before enabling this.

Simple benchmarks indicate this can decode single ints about > 3x faster.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nongli/spark spark-12644

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10593


commit 7eeff58298ceac076779a5cae05ca674ed0ac51a
Author: Nong 
Date:   2015-12-31T22:45:30Z

[SPARK-12636][SQL] Update UnsafeRowParquetRecordReader to support reading 
paths directly.

As noted in the code, this change is to make this componenet easier to
test in isolation.

commit 22afd1f0115b86cdb5ba661dd2c0714ff6a4243b
Author: Nong 
Date:   2016-01-01T00:26:34Z

[SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet 
scan benchmarks.

We've run benchmarks ad hoc to measure the scanner performance. We will 
continue to invest in this
and it makes sense to get these benchmarks into code. This adds a simple 
benchmarking utility to do
this.

commit 3e41ed43ebc16f4ea0f2a642dbf3a5e40a8bd0d9
Author: Nong 
Date:   2016-01-01T05:12:44Z

[SPARK-12635][SQL] Add ColumnarBatch, an in memory columnar format for 
execution.

There are many potential benefits of having an efficient in memory columnar 
format as an alternate
to UnsafeRow. This patch introduces ColumnarBatch/ColumnarVector which 
starts this effort. The
remaining implementation can be done as follow up patches.

As stated in the in the JIRA, there are useful external components that 
operate on memory in a
simple columnar format. ColumnarBatch would serve that purpose and could 
server as a
zero-serialization/zero-copy exchange for this use case.

This patch supports running the underlying data either on heap or off heap. 
On heap runs a bit
faster but we would need offheap for zero-copy exchanges. Currently, this 
mode is hidden behind one
interface (ColumnVector).

This differs from Parquet or the existing columnar cache because this is 
*not* intended to be used
as a storage format. The focus is entirely on CPU efficiency as we expect 
to only have 1 of these
batches in memory per task.

commit d99659d89a7709df8223ab86b1edd244b1e63086
Author: Nong 
Date:   2016-01-01T07:28:06Z

[SPARK-12644][SQL] Update parquet reader to be vectorized.

This inlines a few of the Parquet decoders and adds vectorized APIs to 
support decoding in batch.
There are a few particulars in the Parquet encodings that make this much 
more efficient. In
particular, RLE encodings are very well suited for batch decoding. The 
Parquet 2.0 encodings are
also very suited for this.

This is a work in progress and does not affect the current execution. In 
subsequent patches, we will
support more encodings and types before enabling this.

Simple benchmarks indicate this can decode single ints about > 3x faster.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12640][SQL] Add simple benchmarking uti...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10589#issuecomment-168932593
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12640][SQL] Add simple benchmarking uti...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10589#issuecomment-168932596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48738/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][follow-up] use a single column v...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10588#issuecomment-168933497
  
**[Test build #48750 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48750/consoleFull)**
 for PR 10588 at commit 
[`f3a557b`](https://github.com/apache/spark/commit/f3a557b5534c506e6987388a84ae4e561585d895).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10595#issuecomment-16890
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10593#issuecomment-168934496
  
**[Test build #48749 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48749/consoleFull)**
 for PR 10593 at commit 
[`d99659d`](https://github.com/apache/spark/commit/d99659d89a7709df8223ab86b1edd244b1e63086).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class VectorizedPlainValuesReader extends ValuesReader 
implements VectorizedValuesReader `
  * `public final class VectorizedRleValuesReader extends ValuesReader `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10593#issuecomment-168934501
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48749/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10593#issuecomment-168934499
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12641] Remove unused code related to Ha...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10590#issuecomment-168934843
  
**[Test build #48740 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48740/consoleFull)**
 for PR 10590 at commit 
[`4223cca`](https://github.com/apache/spark/commit/4223ccac9984d07aa858deb00caac4bba5ddc406).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12539][SQL] support writing bucketed ta...

2016-01-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10498#discussion_r48822540
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -189,13 +220,43 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
 ifNotExists = false)).toRdd
   }
 
-  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { parCols =>
-parCols.map { col =>
-  df.logicalPlan.output
-.map(_.name)
-.find(df.sqlContext.analyzer.resolver(_, col))
-.getOrElse(throw new AnalysisException(s"Partition column $col not 
found in existing " +
-  s"columns (${df.logicalPlan.output.map(_.name).mkString(", 
")})"))
+  private def normalizedParCols: Option[Seq[String]] = 
partitioningColumns.map { cols =>
+cols.map(normalize(_, "Partition"))
+  }
+
+  private def normalizedBucketCols: Option[Seq[String]] = 
bucketingColumns.map { cols =>
+cols.map(normalize(_, "Bucketing"))
+  }
+
+  private def normalizedSortCols: Option[Seq[String]] = sortingColumns.map 
{ cols =>
+cols.map(normalize(_, "Sorting"))
+  }
+
+  private def getBucketSpec: Option[BucketSpec] = {
+if (sortingColumns.isDefined) {
+  require(numBuckets.isDefined, "sortBy must be used together with 
bucketBy")
+}
+
+for {
+  n <- numBuckets
+  cols <- normalizedBucketCols
+} yield {
+  require(n > 0, "Bucket number must be greater than 0.")
+  BucketSpec(n, cols, normalizedSortCols)
+}
+  }
+
+  private def normalize(columnName: String, columnType: String): String = {
+val validColumnNames = df.logicalPlan.output.map(_.name)
+validColumnNames.find(df.sqlContext.analyzer.resolver(_, columnName))
+  .getOrElse(throw new AnalysisException(s"$columnType column 
$columnName not found in " +
+s"existing columns (${validColumnNames.mkString(", ")})"))
+  }
+
+  private def assertNotBucketed(): Unit = {
+if (numBuckets.isDefined || sortingColumns.isDefined) {
--- End diff --

maybe we need a better name rather than `sortBy` to indicate that users 
need to give columns which will be used to sort the data in each bucket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12570] [ML] [Doc] DecisionTreeRegressor...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10594#issuecomment-168935497
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-05 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/10596

[SPARK-12401][SQL] Add integration tests for postgres enum types

We can handle posgresql-specific enum types as strings in jdbc.
So, we should just add tests and close the corresponding JIRA ticket.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark AddTestsInIntegration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10596.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10596


commit 6460aca95eccd1249d03546b9deeb90f3f5f02e9
Author: Takeshi YAMAMURO 
Date:   2016-01-05T04:41:13Z

Add tests for postgres enum types




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12570] [ML] [Doc] DecisionTreeRegressor...

2016-01-05 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10594#discussion_r48824613
  
--- Diff: docs/ml-classification-regression.md ---
@@ -535,7 +535,9 @@ The main differences between this API and the [original 
MLlib Decision Tree API]
 * use of DataFrame metadata to distinguish continuous and categorical 
features
 
 
-The Pipelines API for Decision Trees offers a bit more functionality than 
the original API.  In particular, for classification, users can get the 
predicted probability of each class (a.k.a. class conditional probabilities).
+The Pipelines API for Decision Trees offers a bit more functionality than 
the original API.  
+In particular, for classification, users can get the predicted probability 
of each class (a.k.a. class conditional probabilities); 
--- End diff --

My bad, I didnt understand the sentence correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12625][SPARKR][SQL] replace R usage of ...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10584#issuecomment-168940130
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48732/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12616] [SQL] Adding a New Logical Opera...

2016-01-05 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10577#issuecomment-168943487
  
Todo: 
  - Will add the new `Dataframe` and `Dataset` APIs for `unionAll`, if my 
understanding is correct.
  - Will add another rule for pushing `Filter` and `Project` through 
`Unions`.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11579] [ML] avoid creating new optimize...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9614#issuecomment-168946249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11579] [ML] avoid creating new optimize...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9614#issuecomment-168945836
  
**[Test build #48753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48753/consoleFull)**
 for PR 9614 at commit 
[`dcf0d8f`](https://github.com/apache/spark/commit/dcf0d8ff111cdb6812cb5ff74d0119331270b644).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][follow-up] use a single column v...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10588#issuecomment-168952375
  
**[Test build #48750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48750/consoleFull)**
 for PR 10588 at commit 
[`f3a557b`](https://github.com/apache/spark/commit/f3a557b5534c506e6987388a84ae4e561585d895).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11878][SQL]: Eliminate distribute by in...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9858#issuecomment-168952176
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48751/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10583#issuecomment-168955204
  
**[Test build #2322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2322/consoleFull)**
 for PR 10583 at commit 
[`fb3b4a4`](https://github.com/apache/spark/commit/fb3b4a4c461391866bc12a51dd1e60eadeaff916).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] Scaladoc fixes...mostly

2016-01-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10592#issuecomment-168954471
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...

2016-01-05 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9589#discussion_r48829056
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -151,29 +152,34 @@ private[hive] class ClientWrapper(
 // Switch to the initClassLoader.
 Thread.currentThread().setContextClassLoader(initClassLoader)
 val ret = try {
-  val initialConf = new HiveConf(classOf[SessionState])
-  // HiveConf is a Hadoop Configuration, which has a field of 
classLoader and
-  // the initial value will be the current thread's context class 
loader
-  // (i.e. initClassLoader at here).
-  // We call initialConf.setClassLoader(initClassLoader) at here to 
make
-  // this action explicit.
-  initialConf.setClassLoader(initClassLoader)
-  config.foreach { case (k, v) =>
-if (k.toLowerCase.contains("password")) {
-  logDebug(s"Hive Config: $k=xxx")
-} else {
-  logDebug(s"Hive Config: $k=$v")
+  val registeredState = SessionState.get
+  if (registeredState != null && 
registeredState.isInstanceOf[CliSessionState]) {
--- End diff --

When we have a `CliSessionState`, we are using Spark SQL CLI, in this case 
we never need second `SessionState` here. Creating another `SessionState` would 
fail some cases since `CliSessionState` is inherited from `SessionState`, which 
could lead to `ClassCastException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12453][Streaming] Remove explicit depen...

2016-01-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10492#issuecomment-168958835
  
Given the discussion here, I'm pretty confident in this change and would 
like to go ahead and merge it. It will also unblock further fixes in 12269.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-168963321
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12470] [SQL] Fix size reduction calcula...

2016-01-05 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/10421#issuecomment-168963270
  
I have a fix for the test failure. Should I create a new Jira and PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9372] [SQL] Filter nulls in Inner joins...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9451#issuecomment-168963367
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9372] [SQL] Filter nulls in Inner joins...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9451#issuecomment-168963214
  
**[Test build #48754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48754/consoleFull)**
 for PR 9451 at commit 
[`cd8ca34`](https://github.com/apache/spark/commit/cd8ca343019d1e7a2a43128ea070f9cda828dc81).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-168963322
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48748/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/10595#discussion_r48830059
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobSet.scala ---
@@ -59,17 +59,15 @@ case class JobSet(
 
   // Time taken to process all the jobs from the time they were submitted
   // (i.e. including the time they wait in the streaming scheduler queue)
-  def totalDelay: Long = {
-processingEndTime - time.milliseconds
-  }
+  def totalDelay: Long = processingEndTime - time.milliseconds
--- End diff --

Noted & thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10596#issuecomment-168967487
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12340][SQL]fix Int overflow in the Spar...

2016-01-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10562#issuecomment-168967549
  
@QiangCai I think the test failures are unrelated. However before we can 
retest you'll have to rebase as there is a merge conflict now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10596#issuecomment-168967490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48755/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12570] [ML] [Doc] DecisionTreeRegressor...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10594#issuecomment-168935501
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48752/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12641] Remove unused code related to Ha...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10590#issuecomment-168935787
  
**[Test build #2321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2321/consoleFull)**
 for PR 10590 at commit 
[`ffb9fb0`](https://github.com/apache/spark/commit/ffb9fb001b2fe848a7fb4ca4f250dbe206bae0e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12570] [ML] [Doc] DecisionTreeRegressor...

2016-01-05 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10594#discussion_r48822844
  
--- Diff: docs/ml-classification-regression.md ---
@@ -535,7 +535,9 @@ The main differences between this API and the [original 
MLlib Decision Tree API]
 * use of DataFrame metadata to distinguish continuous and categorical 
features
 
 
-The Pipelines API for Decision Trees offers a bit more functionality than 
the original API.  In particular, for classification, users can get the 
predicted probability of each class (a.k.a. class conditional probabilities).
+The Pipelines API for Decision Trees offers a bit more functionality than 
the original API.  
+In particular, for classification, users can get the predicted probability 
of each class (a.k.a. class conditional probabilities); 
--- End diff --

Line ends with ";".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10658][SPARK-11421][PYSPARK][CORE] Prov...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9313#issuecomment-168936205
  
**[Test build #48745 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48745/consoleFull)**
 for PR 9313 at commit 
[`bf3e98f`](https://github.com/apache/spark/commit/bf3e98f07097b21066fcd681c437998ce65a1379).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10658][SPARK-11421][PYSPARK][CORE] Prov...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9313#issuecomment-168936339
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10906][MLlib] More efficient SparseMatr...

2016-01-05 Thread rahulpalamuttam
Github user rahulpalamuttam commented on the pull request:

https://github.com/apache/spark/pull/8960#issuecomment-168942081
  
@jkbradley @mengxr 
I have opened the new PR in Breeze here :
https://github.com/scalanlp/breeze/pull/480



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12645] [SparkR] SparkR support hash fun...

2016-01-05 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/10597

[SPARK-12645] [SparkR] SparkR support hash function

Add ```hash``` function for SparkR ```DataFrame```.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-12645

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10597.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10597


commit c41eb1fd364c52d9eae0469229e0eb850c03c57a
Author: Yanbo Liang 
Date:   2016-01-05T09:42:55Z

SparkR support hash function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11878][SQL]: Eliminate distribute by in...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9858#issuecomment-168952173
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][follow-up] use a single column v...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10588#issuecomment-168952844
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10595#discussion_r48828208
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -286,7 +286,7 @@ abstract class DStream[T: ClassTag] (
 dependencies.foreach(_.validateAtStart())
 
 logInfo("Slide time = " + slideDuration)
--- End diff --

It'd be fine to make this all use interpolation while you're at it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10595#discussion_r48828188
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobSet.scala ---
@@ -59,17 +59,15 @@ case class JobSet(
 
   // Time taken to process all the jobs from the time they were submitted
   // (i.e. including the time they wait in the streaming scheduler queue)
-  def totalDelay: Long = {
-processingEndTime - time.milliseconds
-  }
+  def totalDelay: Long = processingEndTime - time.milliseconds
 
   def toBatchInfo: BatchInfo = {
 BatchInfo(
   time,
   streamIdToInputInfo,
   submissionTime,
-  if (processingStartTime >= 0) Some(processingStartTime) else None,
-  if (processingEndTime >= 0) Some(processingEndTime) else None,
+  if (hasStarted) Some(processingStartTime) else None,
--- End diff --

These change the logic slightly -- are you sure it's equivalent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10595#discussion_r48828135
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobSet.scala ---
@@ -59,17 +59,15 @@ case class JobSet(
 
   // Time taken to process all the jobs from the time they were submitted
   // (i.e. including the time they wait in the streaming scheduler queue)
-  def totalDelay: Long = {
-processingEndTime - time.milliseconds
-  }
+  def totalDelay: Long = processingEndTime - time.milliseconds
--- End diff --

Although I wouldn't bother with this kind of change, it's OK here IMHO


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12638] [API DOC] Parameter explaination...

2016-01-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10587#issuecomment-168955492
  
I like this, but how about adding similar docs to other similar methods 
like treeAggregate, fold, etc? the semantics of fold were brought up just last 
week, for example. Your point about it being per-partition is quite pertinent 
there too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [CORE][MINOR] scaladoc fixes

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10591#issuecomment-168955273
  
**[Test build #2323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2323/consoleFull)**
 for PR 10591 at commit 
[`a23cfcf`](https://github.com/apache/spark/commit/a23cfcf8375c132c8d79c3c0ead3d0c317966f16).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9372] [SQL] Filter nulls in Inner joins...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9451#issuecomment-168963368
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48754/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/10595#discussion_r48830663
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobSet.scala ---
@@ -59,17 +59,15 @@ case class JobSet(
 
   // Time taken to process all the jobs from the time they were submitted
   // (i.e. including the time they wait in the streaming scheduler queue)
-  def totalDelay: Long = {
-processingEndTime - time.milliseconds
-  }
+  def totalDelay: Long = processingEndTime - time.milliseconds
 
   def toBatchInfo: BatchInfo = {
 BatchInfo(
   time,
   streamIdToInputInfo,
   submissionTime,
-  if (processingStartTime >= 0) Some(processingStartTime) else None,
-  if (processingEndTime >= 0) Some(processingEndTime) else None,
+  if (hasStarted) Some(processingStartTime) else None,
--- End diff --

Tested it locally (and can't wait to see the results from Jenkins).

The current code *overly* assumes that the times can be `0` (which 
cannot...ever). It is also more clearer that at `hasCompleted` 
`processingEndTime` is already set. It's over-complicated as it's now IMHO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10583#issuecomment-168965051
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48747/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10583#issuecomment-168965050
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12641] Remove unused code related to Ha...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10590#issuecomment-168935012
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11579] [ML] avoid creating new optimize...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9614#issuecomment-168934988
  
**[Test build #48753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48753/consoleFull)**
 for PR 9614 at commit 
[`dcf0d8f`](https://github.com/apache/spark/commit/dcf0d8ff111cdb6812cb5ff74d0119331270b644).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12641] Remove unused code related to Ha...

2016-01-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10590#issuecomment-168935005
  
I've merged this. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12641] Remove unused code related to Ha...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10590#issuecomment-168935015
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48740/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11579] [ML] avoid creating new optimize...

2016-01-05 Thread hhbyyh
Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/9614#issuecomment-168935901
  
@avulanov Thanks for review. 
The only possible concern I got is LBFGSOptimizer/SGDOptimizer appears to 
be getter, yet they are setters. 

I sent an update which changes the function name only. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9372] [SQL] Filter nulls in Inner joins...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9451#issuecomment-168937118
  
**[Test build #48754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48754/consoleFull)**
 for PR 9451 at commit 
[`cd8ca34`](https://github.com/apache/spark/commit/cd8ca343019d1e7a2a43128ea070f9cda828dc81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12393] [SparkR] Add read.text and write...

2016-01-05 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/10348#issuecomment-168938461
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12616] [SQL] Adding a New Logical Opera...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10577#issuecomment-168943889
  
**[Test build #48756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48756/consoleFull)**
 for PR 10577 at commit 
[`c1f66f7`](https://github.com/apache/spark/commit/c1f66f744fce35eb657f9ec8a971dbd5449d0985).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11878][SQL]: Eliminate distribute by in...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9858#issuecomment-168951738
  
**[Test build #48751 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48751/consoleFull)**
 for PR 9858 at commit 
[`dd2bdc8`](https://github.com/apache/spark/commit/dd2bdc8650e9db763ec3afe290919d8a15404e9d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][follow-up] use a single column v...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10588#issuecomment-168952849
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48750/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] Scaladoc fixes...mostly

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10592#issuecomment-168955445
  
**[Test build #2324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2324/consoleFull)**
 for PR 10592 at commit 
[`fa65c0d`](https://github.com/apache/spark/commit/fa65c0d69ca8ec97edb63353c34dfc5cdd04dacf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-168963051
  
**[Test build #48748 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48748/consoleFull)**
 for PR 10527 at commit 
[`0558bf8`](https://github.com/apache/spark/commit/0558bf8b698e9de7e19625627e487bfb3f33072d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AesEncrypt(left: Expression, right: Expression)`
  * `case class AesDecrypt(left: Expression, right: Expression)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12645] [SparkR] SparkR support hash fun...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10597#issuecomment-168963887
  
**[Test build #48757 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48757/consoleFull)**
 for PR 10597 at commit 
[`c41eb1f`](https://github.com/apache/spark/commit/c41eb1fd364c52d9eae0469229e0eb850c03c57a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12645] [SparkR] SparkR support hash fun...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10597#issuecomment-168963978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48757/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12645] [SparkR] SparkR support hash fun...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10597#issuecomment-168963977
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [STREAMING][MINOR] More contextual information...

2016-01-05 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/10595#discussion_r48830750
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -286,7 +286,7 @@ abstract class DStream[T: ClassTag] (
 dependencies.foreach(_.validateAtStart())
 
 logInfo("Slide time = " + slideDuration)
--- End diff --

Thanks! I was thinking about it, but was worried to propose such changes as 
you might not like it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12573][SPARK-12574][SQL] Move SQL Parse...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10583#issuecomment-168964867
  
**[Test build #48747 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48747/consoleFull)**
 for PR 10583 at commit 
[`fb3b4a4`](https://github.com/apache/spark/commit/fb3b4a4c461391866bc12a51dd1e60eadeaff916).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10596#issuecomment-168967352
  
**[Test build #48755 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48755/consoleFull)**
 for PR 10596 at commit 
[`6460aca`](https://github.com/apache/spark/commit/6460aca95eccd1249d03546b9deeb90f3f5f02e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12641] Remove unused code related to Ha...

2016-01-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10590


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10593#issuecomment-168934389
  
**[Test build #48749 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48749/consoleFull)**
 for PR 10593 at commit 
[`d99659d`](https://github.com/apache/spark/commit/d99659d89a7709df8223ab86b1edd244b1e63086).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >