date:20160130

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177098340
  
**[Test build #50442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50442/consoleFull)**
 for PR 10982 at commit 
[`d1bb199`](https://github.com/apache/spark/commit/d1bb1997497a8a1b3f18c47bb0c394d4bf3029f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...

2016-01-30 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10608#issuecomment-177098407
  
It's a bit hard to know whether the repl changes make sense or not, but I 
think we just need to try it out and see if problems come up.

 LGTM.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177098416
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50440/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177109563
  
**[Test build #50443 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50443/consoleFull)**
 for PR 10982 at commit 
[`964193d`](https://github.com/apache/spark/commit/964193d920bf494148bbd0deee58c4d1e6dc3327).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12982][SQL] Add table name validation i...

2016-01-30 Thread jayadevanmurali

Github user jayadevanmurali commented on the pull request:

https://github.com/apache/spark/pull/10983#issuecomment-177109715
  
@hvanhovell 
I was able to replicate this in spark 2.0.0.

Steps
ayadevan@Satellite-L640:~/spark$ ./bin/spark-shell 
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
  /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
16/01/30 14:19:21 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/01/30 14:19:22 WARN Utils: Your hostname, Satellite-L640 resolves to a 
loopback address: 127.0.1.1; using 100.86.225.72 instead (on interface ppp0)
16/01/30 14:19:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address
Spark context available as sc (master = local[*], app id = 
local-1454143767817).
SQL context available as sqlContext.

scala> import org.apache.spark.sql.types.{StringType, StructField, 
StructType}
import org.apache.spark.sql.types.{StringType, StructField, StructType}

scala> import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.sql.{DataFrame, Row, SQLContext}

scala> import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.{SparkContext, SparkConf}

scala>  val rows = List(Row("foo"), Row("bar"));
rows: List[org.apache.spark.sql.Row] = List([foo], [bar])

scala> val schema = StructType(Seq(StructField("col", StringType)));
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(col,StringType,true))

scala> val rdd = sc.parallelize(rows);
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
ParallelCollectionRDD[0] at parallelize at :32

scala> val df = sqlContext.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [col: string]

scala> df.registerTempTable("t~")

scala> df.sqlContext.dropTempTable("t~")
java.lang.RuntimeException: [1.2] failure: ``.'' expected but `~' found

t~
 ^
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:836)
at org.apache.spark.sql.SQLContext.dropTempTable(SQLContext.scala:763)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:48)
at $iwC$$iwC$$iwC$$iwC.(:50)
at $iwC$$iwC$$iwC.(:52)
at $iwC$$iwC.(:54)
at $iwC.(:56)
at (:58)
at .(:62)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at

[GitHub] spark pull request: [SPARK-8171] [Web UI] Simulated infinite scrol...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10910#issuecomment-177129741
  
**[Test build #50439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50439/consoleFull)**
 for PR 10910 at commit 
[`35e08c7`](https://github.com/apache/spark/commit/35e08c7d3f7b89a04405795aa806cf5bbf76d9ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread wangyang1992

GitHub user wangyang1992 opened a pull request:

https://github.com/apache/spark/pull/10994

[SPARK-13100] [SQL] improving the performance of stringToDate method in 
DateTimeUtils.scala

Using an instance variable to hold an GMT TimeZone object instead of 
instantiate it every time.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyang1992/spark datetimeUtil

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10994.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10994


commit 19defc9c83da6206288c7ee70ce97f2e08603f72
Author: wangyang 
Date:   2016-01-30T08:33:40Z

improving the performance of stringToDate method in DateTimeUtils.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12982][SQL] Add table name validation i...

2016-01-30 Thread jayadevanmurali

Github user jayadevanmurali commented on a diff in the pull request:

https://github.com/apache/spark/pull/10983#discussion_r51342667
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -747,7 +747,7 @@ class SQLContext private[sql](
* only during the lifetime of this instance of SQLContext.
*/
   private[sql] def registerDataFrameAsTable(df: DataFrame, tableName: 
String): Unit = {
-catalog.registerTable(TableIdentifier(tableName), df.logicalPlan)
+catalog.registerTable(SqlParser.parseTableIdentifier(tableName), 
df.logicalPlan)
--- End diff --

Ok I can see the variable definition at line 211 of SqlContext.scala
@transient
 protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))

But this varable is not used anywhare.All methods  use 
Sqlarser.parseTableIdentifier() for example  @Experimental
def createExternalTable(
tableName: String,
source: String,
options: Map[String, String]): DataFrame = {
**val tableIdent = SqlParser.parseTableIdentifier(tableName)**
val cmd =
CreateTableUsing(
tableIdent,
userSpecifiedSchema = None,
source,
temporary = false,
options,
allowExisting = false,
managedIfNoPath = false)
executePlan(cmd).toRdd
table(tableIdent)
}

Correct me if am wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-177111484
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-177111489
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50438/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...

2016-01-30 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10608#issuecomment-177098650
  
Merging this in master. Hopefully compilation will be faster.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177098415
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177106034
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177114527
  
**[Test build #50444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50444/consoleFull)**
 for PR 10994 at commit 
[`19defc9`](https://github.com/apache/spark/commit/19defc9c83da6206288c7ee70ce97f2e08603f72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177098020
  
**[Test build #50441 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50441/consoleFull)**
 for PR 10942 at commit 
[`925827b`](https://github.com/apache/spark/commit/925827bc01e484c1d1ffb584fde86324b0640ca2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13070][SQL] Better error message when P...

2016-01-30 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10979#issuecomment-177099376
  
cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10942#discussion_r51342432
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -59,6 +61,141 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
 }
   }
 
+  // To verify if pruning works, we compare the results before filtering
+  private def checkPrunedAnswers(
+  sourceDataFrame: DataFrame,
+  filterCondition: Column,
+  expectedAnswer: DataFrame): Unit = {
+val filter = 
sourceDataFrame.filter(filterCondition).queryExecution.executedPlan
+assert(
+  filter.isInstanceOf[execution.Filter] ||
+  (filter.isInstanceOf[WholeStageCodegen] &&
+
filter.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[execution.Filter]))
+checkAnswer(
+  expectedAnswer.orderBy(expectedAnswer.logicalPlan.output.map(attr => 
Column(attr)) : _*),
+  filter.children.head.executeCollectPublic().sortBy(_.toString()))
+  }
+
+  test("read partitioning bucketed tables with bucket pruning filters") {
+val df = (10 until 50).map(i => (i % 5, i % 13 + 10, 
i.toString)).toDF("i", "j", "k")
+
+withTable("bucketed_table") {
+  // The number of buckets should be large enough to make sure each 
bucket contains
+  // at most one bucketing key value.
+  // json does not support predicate push-down, and thus json is used 
here
--- End diff --

Does it mean bucket pruning is not very useful for parquet?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10942#discussion_r51342422
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -59,6 +61,141 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
 }
   }
 
+  // To verify if pruning works, we compare the results before filtering
+  private def checkPrunedAnswers(
+  sourceDataFrame: DataFrame,
+  filterCondition: Column,
+  expectedAnswer: DataFrame): Unit = {
+val filter = 
sourceDataFrame.filter(filterCondition).queryExecution.executedPlan
+assert(
+  filter.isInstanceOf[execution.Filter] ||
+  (filter.isInstanceOf[WholeStageCodegen] &&
--- End diff --

damn forgot about the `WholeStageCodegen` stuff. How about we call 
`filter.find` to get the underlying relation operator directly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177100554
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177100558
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50442/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177100464
  
**[Test build #50442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50442/consoleFull)**
 for PR 10982 at commit 
[`d1bb199`](https://github.com/apache/spark/commit/d1bb1997497a8a1b3f18c47bb0c394d4bf3029f3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10942#discussion_r51342412
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -59,6 +61,141 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
 }
   }
 
+  // To verify if pruning works, we compare the results before filtering
+  private def checkPrunedAnswers(
+  sourceDataFrame: DataFrame,
+  filterCondition: Column,
--- End diff --

instead of having these 2 parameters, how about we just ask caller to pass 
in a `filteredDataFrame`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177102470
  
@gatorsmile thanks for your work, it's very close now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177108672
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10982#discussion_r51342217
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import org.apache.spark.sql.AnalysisException
+
+
+/**
+ * Interface for the system catalog (of columns, partitions, tables, and 
databases).
+ *
+ * This is only used for non-temporary items, and implementations must be 
thread-safe as they
+ * can be accessed in multiple threads.
+ */
+abstract class Catalog {
+
+  // 
--
+  // Databases
+  // 
--
+
+  def createDatabase(dbDefinition: Database, ifNotExists: Boolean): Unit
--- End diff --

need to define when we should throw exceptions in api contract


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...

2016-01-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10608


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-177111361
  
**[Test build #50438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50438/consoleFull)**
 for PR 10989 at commit 
[`0139fde`](https://github.com/apache/spark/commit/0139fdeeefc2038e995c44c7e966e09e30063418).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8171] [Web UI] Simulated infinite scrol...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10910#issuecomment-177129764
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8171] [Web UI] Simulated infinite scrol...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10910#issuecomment-177129767
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50439/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177288104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50445/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177288102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10995#issuecomment-177290339
  
**[Test build #50447 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50447/consoleFull)**
 for PR 10995 at commit 
[`21cbc45`](https://github.com/apache/spark/commit/21cbc45d9971f7c64356709fc7d3b5c5ffbb06c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-177286324
  
**[Test build #50446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50446/consoleFull)**
 for PR 9893 at commit 
[`c59a1e6`](https://github.com/apache/spark/commit/c59a1e667e0142c20ee982171f13bfce02b93aa4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-177286495
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-177286497
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50446/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177288023
  
**[Test build #50445 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50445/consoleFull)**
 for PR 10942 at commit 
[`f5acd00`](https://github.com/apache/spark/commit/f5acd00d4c28a6e65ca8200ec93b1874e921e0f0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10995#issuecomment-177291160
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50447/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10995#issuecomment-177291156
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10942#discussion_r51347234
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -59,6 +61,141 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
 }
   }
 
+  // To verify if pruning works, we compare the results before filtering
+  private def checkPrunedAnswers(
+  sourceDataFrame: DataFrame,
+  filterCondition: Column,
+  expectedAnswer: DataFrame): Unit = {
+val filter = 
sourceDataFrame.filter(filterCondition).queryExecution.executedPlan
+assert(
+  filter.isInstanceOf[execution.Filter] ||
+  (filter.isInstanceOf[WholeStageCodegen] &&
+
filter.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[execution.Filter]))
+checkAnswer(
+  expectedAnswer.orderBy(expectedAnswer.logicalPlan.output.map(attr => 
Column(attr)) : _*),
+  filter.children.head.executeCollectPublic().sortBy(_.toString()))
+  }
+
+  test("read partitioning bucketed tables with bucket pruning filters") {
+val df = (10 until 50).map(i => (i % 5, i % 13 + 10, 
i.toString)).toDF("i", "j", "k")
+
+withTable("bucketed_table") {
+  // The number of buckets should be large enough to make sure each 
bucket contains
+  // at most one bucketing key value.
+  // json does not support predicate push-down, and thus json is used 
here
--- End diff --

Bucketing pruning can avoid scanning many useless bucket files. In each 
bucket file, it could have many different values. Row filtering in Parquet is a 
really great feature for efficiently scanning a given bucket. We need both for 
achieving the best performance.

Let me try to answer why record filtering in Parquet is not perfect to 
resolve all the issues:
  - The current way is very limited. To filter row groups, it is based on 
the min / max value in the row group. That means, it might scan many useless 
row groups. 
  - It is not free. It still needs to scan metadata to prune row groups. 
  - Parquet team is trying to improve it by adding more advanced statistics 
into the metadata (e.g., bloom filters in PARQUET-41 and dictionary in 
PARQUET-384). Also, there still exist a few limits (e.g., PARQUET-295).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10942#discussion_r51347237
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -59,6 +61,141 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
 }
   }
 
+  // To verify if pruning works, we compare the results before filtering
+  private def checkPrunedAnswers(
+  sourceDataFrame: DataFrame,
+  filterCondition: Column,
+  expectedAnswer: DataFrame): Unit = {
+val filter = 
sourceDataFrame.filter(filterCondition).queryExecution.executedPlan
+assert(
+  filter.isInstanceOf[execution.Filter] ||
+  (filter.isInstanceOf[WholeStageCodegen] &&
--- End diff --

Sure, will do. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10942#discussion_r51347247
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -59,6 +61,141 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
 }
   }
 
+  // To verify if pruning works, we compare the results before filtering
+  private def checkPrunedAnswers(
+  sourceDataFrame: DataFrame,
+  filterCondition: Column,
--- End diff --

Sure, will do. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177265449
  
**[Test build #50445 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50445/consoleFull)**
 for PR 10942 at commit 
[`f5acd00`](https://github.com/apache/spark/commit/f5acd00d4c28a6e65ca8200ec93b1874e921e0f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-01-30 Thread tedyu

Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/10995#issuecomment-177266287
  
Would like some feedback before creating JIRA.

Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-01-30 Thread tedyu

GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/10995

[test-maven] Shade protobuf-java

See this thread for background information:


http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis

This PR shades com.google.protobuf:protobuf-java as 
org.spark-project.protobuf

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10995.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10995


commit 21cbc45d9971f7c64356709fc7d3b5c5ffbb06c8
Author: tedyu 
Date:   2016-01-30T18:16:09Z

Shade protobuf-java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10995#issuecomment-177268114
  
**[Test build #50447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50447/consoleFull)**
 for PR 10995 at commit 
[`21cbc45`](https://github.com/apache/spark/commit/21cbc45d9971f7c64356709fc7d3b5c5ffbb06c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-177265711
  
**[Test build #50446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50446/consoleFull)**
 for PR 9893 at commit 
[`c59a1e6`](https://github.com/apache/spark/commit/c59a1e667e0142c20ee982171f13bfce02b93aa4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177133947
  
**[Test build #50441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50441/consoleFull)**
 for PR 10942 at commit 
[`925827b`](https://github.com/apache/spark/commit/925827bc01e484c1d1ffb584fde86324b0640ca2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177134000
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50441/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177136524
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50443/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177136523
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13078][SQL] Infrastructure for the inte...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10982#issuecomment-177136488
  
**[Test build #50443 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50443/consoleFull)**
 for PR 10982 at commit 
[`964193d`](https://github.com/apache/spark/commit/964193d920bf494148bbd0deee58c4d1e6dc3327).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class CatalogTestCases extends SparkFunSuite `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177141259
  
**[Test build #50444 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50444/consoleFull)**
 for PR 10994 at commit 
[`19defc9`](https://github.com/apache/spark/commit/19defc9c83da6206288c7ee70ce97f2e08603f72).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177147895
  
LGTM. Are the other such instances in the code? Best to look for these all 
at once


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread wangyang1992

Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177151828
  
@srowen No, just that one in this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10942#issuecomment-177133999
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12982][SQL] Add table name validation i...

2016-01-30 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10983#issuecomment-177137733
  
You are using an older version of the master branch (last commit 25 days 
ago). Your version still has the ```org.apache.spark.sql.catalyst.SqlParser``` 
class. That has been removed since commit 
https://github.com/apache/spark/commit/7cd7f2202547224593517b392f56e49e4c94cabc.

Please update your master, and try again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177142164
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50444/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177142154
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10994


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML][MINOR] Invalid MulticlassClassification r...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10996#issuecomment-177350346
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML][MINOR] Invalid MulticlassClassification r...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10996#issuecomment-177350347
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50451/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML][MINOR] Invalid MulticlassClassification r...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10996#issuecomment-177350319
  
**[Test build #50451 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50451/consoleFull)**
 for PR 10996 at commit 
[`41f5338`](https://github.com/apache/spark/commit/41f533825e080b47f2a31f1dc4cbac0adf39e40f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10934#issuecomment-177353166
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10934#issuecomment-177353167
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50448/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-01-30 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10934#discussion_r51352293
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -821,6 +821,75 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 checkpointWriter.stop()
   }
 
+  test("SPARK-6847: stack overflow when updateStateByKey is followed by a 
checkpointed dstream") {
+// In this test, there are two updateStateByKey operators. The RDD DAG 
is as follows:
+//
+// batch 1batch 2batch 3 ...
+//
+// 1) input rdd  input rdd  input rdd
+//   |  |  |
+//   v  v  v
+// 2) cogroup rdd   ---> cogroup rdd   ---> cogroup rdd  ...
+//   | /| /|
+//   v/ v/ v
+// 3)  map rdd ---map rdd ---map rdd ...
+//   |  |  |
+//   v  v  v
+// 4) cogroup rdd   ---> cogroup rdd   ---> cogroup rdd  ...
+//   | /| /|
+//   v/ v/ v
+// 5)  map rdd ---map rdd ---map rdd ...
+//
+// Every batch depends on its previous batch, so "updateStateByKey" 
needs to do checkpoint to
+// break the RDD chain. However, before SPARK-6847, when the state RDD 
(layer 5) of the second
+// "updateStateByKey" does checkpoint, it won't checkpoint the state 
RDD (layer 3) of the first
+// "updateStateByKey" (Note: "updateStateByKey" has already marked 
that its state RDD (layer 3)
+// should be checkpointed). Hence, the connections between layer 2 and 
layer 3 won't be broken
+// and the RDD chain will grow infinitely and cause StackOverflow.
+//
+// Therefore SPARK-6847 introduces 
"spark.checkpoint.checkpointAllMarked" to force checkpointing
+// all marked RDDs in the DAG to resolve this issue. (For the previous 
example, it will break
+// connections between layer 2 and layer 3)
+ssc = new StreamingContext(master, framework, batchDuration)
+val batchCounter = new BatchCounter(ssc)
+ssc.checkpoint(checkpointDir)
+val inputDStream = new CheckpointInputDStream(ssc)
+val updateFunc = (values: Seq[Int], state: Option[Int]) => {
+  Some(values.sum + state.getOrElse(0))
+}
+@volatile var shouldCheckpointAllMarkedRDDs = false
+@volatile var rddsCheckpointed = false
+inputDStream.map(i => (i, i))
+  .updateStateByKey(updateFunc).checkpoint(batchDuration)
+  .updateStateByKey(updateFunc).checkpoint(batchDuration)
+  .foreachRDD { rdd =>
+/**
+ * Find all RDDs that are marked for checkpointing in the 
specified RDD and its ancestors.
+ */
+def findAllMarkedRDDs(rdd: RDD[_]): List[RDD[_]] = {
--- End diff --

> I meant put this in a private def outside of this test actually. It would 
make the test body smaller.

But it will refer to the CheckpointSuite class which is not serializable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177337405
  
**[Test build #50450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50450/consoleFull)**
 for PR 10702 at commit 
[`e83b822`](https://github.com/apache/spark/commit/e83b8223846cc41942469fc4b78e9f0500239e0f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177343128
  
**[Test build #50450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50450/consoleFull)**
 for PR 10702 at commit 
[`e83b822`](https://github.com/apache/spark/commit/e83b8223846cc41942469fc4b78e9f0500239e0f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177343337
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10934#issuecomment-177353119
  
**[Test build #50448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50448/consoleFull)**
 for PR 10934 at commit 
[`20e4509`](https://github.com/apache/spark/commit/20e45095506067f3f5195470e3a390cd4872e531).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10678#issuecomment-177353382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50449/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10934#issuecomment-177327392
  
**[Test build #50448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50448/consoleFull)**
 for PR 10934 at commit 
[`20e4509`](https://github.com/apache/spark/commit/20e45095506067f3f5195470e3a390cd4872e531).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177343340
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50450/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

2016-01-30 Thread dilipbiswal

Github user dilipbiswal commented on the pull request:

https://github.com/apache/spark/pull/10943#issuecomment-177351900
  
@cloud-fan Hi Wenchen, let me know if i have interpreted your suggestion 
correctly ? Please let me know if something is amiss. df.resolve() has many 
callers .. so i have not changed its name but have added a comment. Let me know 
if you want me to refactor it. Thanks..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10678#discussion_r51352501
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -521,38 +522,96 @@ class Analyzer(
*/
   object ResolveSortReferences extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case s @ Sort(ordering, global, p @ Project(projectList, child))
-  if !s.resolved && p.resolved =>
-val (newOrdering, missing) = resolveAndFindMissing(ordering, p, 
child)
+  // Here, this rule only resolves the missing sort references if the 
child is not Aggregate
+  //   Another rule ResolveAggregateFunctions will resolve that case.
--- End diff --

@cloud-fan I kept the function implementation in the 
`ResolveAggregateFunctions`, but I called the function in 
`ResolveSortReferences`. Since the rule `ResolveAggregateFunctions` covers two 
cases (`filter` and `sort`), I am afraid the code readers might feel confused 
if we split them into two rules. This function call is public. I am not sure if 
this way is appropriate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10678#issuecomment-177331561
  
**[Test build #50449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50449/consoleFull)**
 for PR 10678 at commit 
[`ba02f46`](https://github.com/apache/spark/commit/ba02f4695e4bfd07a9bef72f783bef3894d8191e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML][MINOR] Invalid MulticlassClassification r...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10996#issuecomment-177347951
  
**[Test build #50451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50451/consoleFull)**
 for PR 10996 at commit 
[`41f5338`](https://github.com/apache/spark/commit/41f533825e080b47f2a31f1dc4cbac0adf39e40f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10678#issuecomment-177353314
  
**[Test build #50449 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50449/consoleFull)**
 for PR 10678 at commit 
[`ba02f46`](https://github.com/apache/spark/commit/ba02f4695e4bfd07a9bef72f783bef3894d8191e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177328648
  
Thanks - merging this in.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML][MINOR] Invalid MulticlassClassification r...

2016-01-30 Thread Lewuathe

GitHub user Lewuathe opened a pull request:

https://github.com/apache/spark/pull/10996

[ML][MINOR] Invalid MulticlassClassification reference in ml-guide

In 
[ml-guide](https://spark.apache.org/docs/latest/ml-guide.html#example-model-selection-via-cross-validation),
 there is invalid reference to `MulticlassClassificationEvaluator` apidoc.


https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.MultiClassClassificationEvaluator

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Lewuathe/spark fix-typo-in-ml-guide

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10996


commit 41f533825e080b47f2a31f1dc4cbac0adf39e40f
Author: Lewuathe 
Date:   2016-01-31T00:23:17Z

[ML][MINOR] Invalid MulticlassClassification reference in ml-guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10943#issuecomment-177352984
  
**[Test build #50452 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50452/consoleFull)**
 for PR 10943 at commit 
[`8201994`](https://github.com/apache/spark/commit/82019947e9777a93ac4d137aed52e09a6434b56e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10678#issuecomment-177353380
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10997#issuecomment-177376532
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10997#issuecomment-177376534
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50453/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12850] [SQL] Support Bucket Pruning (Pr...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10942#discussion_r51355104
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -59,6 +60,136 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
 }
   }
 
+  // To verify bucket pruning, we compare the contents of remaining 
buckets (before filtering)
+  // with the expectedAnswer.
+  private def checkPrunedAnswers(
+  bucketedDataFrame: DataFrame,
+  expectedAnswer: DataFrame): Unit = {
+val rdd = 
bucketedDataFrame.queryExecution.executedPlan.find(_.isInstanceOf[PhysicalRDD])
+assert(rdd.isDefined)
+checkAnswer(
+  expectedAnswer.orderBy(expectedAnswer.logicalPlan.output.map(attr => 
Column(attr)) : _*),
+  rdd.get.executeCollectPublic().sortBy(_.toString()))
+  }
+
+  test("read partitioning bucketed tables with bucket pruning filters") {
+val df = (10 until 50).map(i => (i % 5, i % 13 + 10, 
i.toString)).toDF("i", "j", "k")
+
+withTable("bucketed_table") {
+  // The number of buckets should be large enough to make sure each 
bucket contains
--- End diff --

why this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread iyounus

Github user iyounus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51355081
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
 }
   }
 
+  test("linear regression model with constant label") {
+/*
+   R code:
+   for (formula in c(b.const ~ . -1, b.const ~ .)) {
+ model <- lm(formula, data=df.const.label, weights=w)
+ print(as.vector(coef(model)))
+   }
+  [1] -9.221298  3.394343
+  [1] 17  0  0
+*/
+val expected = Seq(
+  Vectors.dense(0.0, -9.221298, 3.394343),
+  Vectors.dense(17.0, 0.0, 0.0))
+
+Seq("auto", "l-bfgs", "normal").foreach { solver =>
+  var idx = 0
+  for (fitIntercept <- Seq(false, true)) {
+val model = new LinearRegression()
+  .setFitIntercept(fitIntercept)
+  .setWeightCol("weight")
+  .setSolver(solver)
+  .fit(datasetWithWeightConstantLabel)
+val actual = Vectors.dense(model.intercept, model.coefficients(0), 
model.coefficients(1))
+assert(actual ~== expected(idx) absTol 1e-4)
+idx += 1
--- End diff --

I'm not sure how to _check the size of lost history_. Could you please 
point me to some example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10997#issuecomment-177381245
  
how about hive context? Should we update `HiveQl.scala` too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177381602
  
**[Test build #50455 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50455/consoleFull)**
 for PR 10702 at commit 
[`c0744d8`](https://github.com/apache/spark/commit/c0744d8a3c08756546925c9f82274f50d1d4affd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177385145
  
Commenting on your issues. 

Issue 1:
With `WeightedLeastSquares`, we have option to standardize the label and 
features separately. As a result, if the label is not standardized, even `yStd 
== 0`, the problem can be solved. 

As a result, in your case 4, when label is not standardized, and the 
features are standardized, this is not defined, so the users should get the 
result.

For case 3, can you elaborate why analytical solution exists even the label 
is standardized? 

Issue 2: 

In my opinion, even case 1, and case 2 are ill-defined since in GLMNET, the 
label is standardized by default, and GLMNET will not return any result at all. 
It just happens that without regularization, with/without standardization on 
labels will not change the solution, so we just treat them as if we don't 
standardize the label. This can explain your case 3.

Issue 3:

I think this is because your normal equation solver doesn't standardize the 
label, so the discrepancies occur. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177396200
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50455/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10943#discussion_r51355492
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -150,6 +153,17 @@ class DataFrame private[sql](
 }
   }
 
+  /**
+   * Resolves a column name. This is called when it is required to resolve 
a column by its
+   * name only and not as a column path..
+   */
+  private[sql] def resolveColName(colName: String, userSuppliedName: 
String): Boolean = {
--- End diff --

how about
```
private[sql] def indexOf(colName: String): Option[Int] = {
  val resolver = sqlContext.analyzer.resolver
  val index = queryExecution.analyzed.output.indexWhere(f => 
resolver(f.name, colName))
  if (index >= 0) Some(index) else None
}
```

then we can rewrite `withColumn` to:
```
indexOf(colName).map { index =>
  select(output.updated(index, col.as(colName)).map(Column(_)) : _*)
}.getOrElse {
  select(Column("*"), col.as(colName))
}
```

There may be better name for this, like `resolveToIndex`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-177396130
  
**[Test build #50455 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50455/consoleFull)**
 for PR 10702 at commit 
[`c0744d8`](https://github.com/apache/spark/commit/c0744d8a3c08756546925c9f82274f50d1d4affd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10678#discussion_r51355710
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -521,38 +522,99 @@ class Analyzer(
*/
   object ResolveSortReferences extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case s @ Sort(ordering, global, p @ Project(projectList, child))
-  if !s.resolved && p.resolved =>
-val (newOrdering, missing) = resolveAndFindMissing(ordering, p, 
child)
+  case s @ Sort(_, _, a: Aggregate) if a.resolved =>
--- End diff --

@cloud-fan Sure, let me change it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10678#discussion_r51355714
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -521,38 +522,99 @@ class Analyzer(
*/
   object ResolveSortReferences extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case s @ Sort(ordering, global, p @ Project(projectList, child))
-  if !s.resolved && p.resolved =>
-val (newOrdering, missing) = resolveAndFindMissing(ordering, p, 
child)
+  case s @ Sort(_, _, a: Aggregate) if a.resolved =>
--- End diff --

`ResolveAggregateFunctions` can handle missing attributes that can be 
resolved in grandchild. If there are more complex cases, I think that rule can 
at least resolve aggregate functions and go back to this rule to complete 
resolution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10678#discussion_r51355706
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -521,38 +522,99 @@ class Analyzer(
*/
   object ResolveSortReferences extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case s @ Sort(ordering, global, p @ Project(projectList, child))
-  if !s.resolved && p.resolved =>
-val (newOrdering, missing) = resolveAndFindMissing(ordering, p, 
child)
+  case s @ Sort(_, _, a: Aggregate) if a.resolved =>
--- End diff --

@davies The missing attributes are also handled in 
`ResolveAggregateFunctions`. Thus it works. To answer your first question 
regarding `!s.resolved`, this is part of the algorithm design in the rule 
`ResolveAggregateFunctions`, as shown below: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L706-L708


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-177414971
  
**[Test build #50457 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50457/consoleFull)**
 for PR 10527 at commit 
[`04a14cf`](https://github.com/apache/spark/commit/04a14cf072630fbe3619bf241ff3d10d383594a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-177415021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50457/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-177415020
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10678#discussion_r51356255
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -521,38 +522,99 @@ class Analyzer(
*/
   object ResolveSortReferences extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case s @ Sort(ordering, global, p @ Project(projectList, child))
-  if !s.resolved && p.resolved =>
-val (newOrdering, missing) = resolveAndFindMissing(ordering, p, 
child)
+  case s @ Sort(_, _, a: Aggregate) if a.resolved =>
--- End diff --

So this seems that the rule in `ResolveAggregateFunctions` does not really 
resolve the missing attributes, we could keep that rule unchanged in this PR.

If it's not trivial to fix this, we could create another JIRA for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10678#discussion_r51356658
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -521,38 +522,99 @@ class Analyzer(
*/
   object ResolveSortReferences extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case s @ Sort(ordering, global, p @ Project(projectList, child))
-  if !s.resolved && p.resolved =>
-val (newOrdering, missing) = resolveAndFindMissing(ordering, p, 
child)
+  case s @ Sort(_, _, a: Aggregate) if a.resolved =>
--- End diff --

@cloud-fan I will let `ResolveAggregateFunctions` handle the missing 
attribute resolution as long as the child of Sort is Aggregate.
```scala
// Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
case sa @ Sort(_, _, child: Aggregate) => sa
```
When rewriting `ResolveSortReferences` in another PR, I will try to make 
the behaviors of both rules identical for resolving the missing attributes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-01-30 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10989#discussion_r51356688
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala
 ---
@@ -81,6 +82,30 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
 benchmark.run()
   }
 
+  def testBroadcastHashJoin(values: Int): Unit = {
+val benchmark = new Benchmark("BroadcastHashJoin", values)
+
+val dim = broadcast(sqlContext.range(1 << 16).selectExpr("id as k", 
"cast(id as string) as v"))
+
+benchmark.addCase("BroadcastHashJoin w/o codegen") { iter =>
+  sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
+  sqlContext.range(values).join(dim, (col("id") % 6) === 
col("k")).count()
+}
+benchmark.addCase(s"BroadcastHashJoin w codegen") { iter =>
+  sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
+  sqlContext.range(values).join(dim, (col("id") % 6) === 
col("k")).count()
+}
+
+/*
+  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
+  BroadcastHashJoin: Avg Time(ms)Avg Rate(M/s)  
Relative Rate
+  
---
+  BroadcastHashJoin w/o codegen   3053.41 3.43 
1.00 X
+  BroadcastHashJoin w codegen 1028.4010.20 
2.97 X
--- End diff --

Since the dimension table is pretty small, overhead of broadcast is also 
low, when I ran it with larger range, the improvements did not change much, 
because looking up in BytesToBytes is the bottleneck. I will have another PR to 
improve the join with small dimension table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 159 matches

Mail list logo