[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217533079 @hvanhovell never mind, I made a mistake in the query --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12899#discussion_r62374320 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -155,7 +155,13 @@ private[spark] abstract class Task[T]( */ def

[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217527040 I still can't run Q41 after this PR: ``` select distinct(i_product_name) from item i1 where i_manufact_id between 738 and 738+40 and (select

[GitHub] spark pull request: [SPARK-14654][CORE] New accumulator API

2016-05-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12612#discussion_r62369241 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -19,200 +19,106 @@ package

[GitHub] spark pull request: [SPARK-14512][DOC] Add python example for Quan...

2016-05-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12281#issuecomment-217511820 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12113#issuecomment-217509314 LGTM, will merge this today if no more comments from other people. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62362600 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,86 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62361823 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improve the physical p...

2016-05-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12947#issuecomment-217506848 LGTM. @marmbrus Could you take a quick look on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62358876 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,86 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62357413 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -26,6 +26,7 @@ import

[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improves the output of...

2016-05-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12947#issuecomment-217355660 @clockfly Can we show table name instead of `HadoopFiles` or together? If there is no table name, we could use the rightest part of path. --- If your project is set up

[GitHub] spark pull request: [SPARK-14752][SQL] LazilyGenerateOrdering thro...

2016-05-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12661#discussion_r62265714 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -154,12 +154,14 @@ class

[GitHub] spark pull request: [SPARK-14752][SQL] LazilyGenerateOrdering thro...

2016-05-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12661#issuecomment-217296484 Could you add an regression test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14521][SQL]StackOverflowError in Kryo w...

2016-05-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12598#issuecomment-217294793 @jfchen I think you are hitting a different thing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62260776 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62259367 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,86 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12887#issuecomment-217243859 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11982] [SQL] improve performance of car...

2016-05-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/9969#issuecomment-217087109 Scale factor 1 and 10 (1G and 10G). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15045] [CORE] Remove dead code in TaskM...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12829#issuecomment-217078394 LGTM, merging into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12887#issuecomment-217077679 @NarineK It will be great if could update that too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12887#discussion_r62148621 --- Diff: R/pkg/R/DataFrame.R --- @@ -570,10 +570,17 @@ setMethod("unpersist", #' Repartition #' -#' Return a new SparkDataFram

[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12899#issuecomment-217075463 I will be great we could have a test to make sure that we always have this behavior. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12887#discussion_r62148066 --- Diff: R/pkg/R/DataFrame.R --- @@ -570,10 +570,17 @@ setMethod("unpersist", #' Repartition #' -#' Return a new SparkDataFram

[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217074291 InMemoryTableScanExec is a case class, not singleton, why can we not implement these there? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12887#discussion_r62147393 --- Diff: R/pkg/R/DataFrame.R --- @@ -570,10 +570,17 @@ setMethod("unpersist", #' Repartition #' -#' Return a new SparkDataFram

[GitHub] spark pull request: [SPARK-15045] [CORE] Remove dead code in TaskM...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12829#discussion_r62130992 --- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java --- @@ -389,12 +389,6 @@ public long cleanUpAllAllocatedMemory

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62125338 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12887#discussion_r62118057 --- Diff: R/pkg/R/DataFrame.R --- @@ -586,11 +593,22 @@ setMethod("unpersist", #' path <- "path/to/file.json" #' df <-

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12887#discussion_r62116994 --- Diff: R/pkg/R/DataFrame.R --- @@ -570,10 +570,17 @@ setMethod("unpersist", #' Repartition #' -#' Return a new SparkDataFram

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12887#discussion_r62116742 --- Diff: R/pkg/R/generics.R --- @@ -167,7 +167,7 @@ setGeneric("reduce", function(x, func) { standardGeneric("reduce") }) #

[GitHub] spark pull request: [SPARK-15110][SparkR] Implement repartitionByC...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12887#discussion_r62116629 --- Diff: R/pkg/R/generics.R --- @@ -167,7 +167,7 @@ setGeneric("reduce", function(x, func) { standardGeneric("reduce") }) #

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62116124 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62115610 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-14670][SQL] Allow updating SQLMetrics o...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12427#issuecomment-217002753 @andrewor14 @cloud-fan Can we target this one to 2.0? The metrics of BroadcastExchange depends on this one. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12899#issuecomment-217002044 Since we have only one BlockStatusesAccumulator object in TaskMetrics, it may not worth to do 2). --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62111985 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62112043 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62111647 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62110398 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-216953038 In high level, I think most of the implementation should be done in InMemoryTableScanExec, not in WholeStageCodegenExec. --- If your project is set up for it, you can

[GitHub] spark pull request: [MINOR] Add python3 compatibility in python ex...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12868#issuecomment-216949174 LGTM, merging into master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15115][SQL] Reorganize whole stage code...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12891#issuecomment-216948700 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-14951][SQL] Support subexpression elimi...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12729#issuecomment-216947555 Merging this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12899#issuecomment-216940164 ``` scala> ser.newInstance().serialize(ArrayBuffer.empty[Any]) res4: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=173 cap=214] sc

[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12899#issuecomment-216937041 @cloud-fan How much we can gain from 2)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12899#discussion_r62077628 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -291,25 +291,32 @@ private[spark] object TaskMetrics extends Logging

[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12899#discussion_r62077575 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1097,8 +1097,8 @@ class DAGScheduler( throw new

[GitHub] spark pull request: [SPARK-15107][SQL] Allow varying # iterations ...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12884#issuecomment-216751414 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-5929][PYSPARK] Context addPyPackage and...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12398#issuecomment-216751245 @buckhx These API seems useful, could you also add an argument for bin/spark-submit (only for requirement file) ? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5929][PYSPARK] Context addPyPackage and...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12398#discussion_r61992444 --- Diff: python/pyspark/context.py --- @@ -814,6 +817,40 @@ def addPyFile(self, path): import importlib

[GitHub] spark pull request: [SPARK-5929][PYSPARK] Context addPyPackage and...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12398#discussion_r61992299 --- Diff: python/pyspark/context.py --- @@ -814,6 +817,40 @@ def addPyFile(self, path): import importlib

[GitHub] spark pull request: [SPARK-5929][PYSPARK] Context addPyPackage and...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12398#discussion_r61992184 --- Diff: python/pyspark/tests.py --- @@ -1947,6 +1947,33 @@ def test_with_stop(self): sc.stop() self.assertEqual(SparkContext

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-216749219 The number of partition could be specified in repartitioning() (or 200 as default). `KeyValueGroupedDataset.flatMapGroups()` accept a function as `(K, Iterator

[GitHub] spark pull request: [SPARK-15107][SQL] Allow varying # iterations ...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12884#issuecomment-216748498 We changed the number of rows in same benchmark, but did not update the results, should we also update the result? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-15107][SQL] Allow varying # iterations ...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12884#discussion_r61991224 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -346,7 +373,7 @@ class BenchmarkWholeStageCodegen

[GitHub] spark pull request: [SPARK-15029] improve error message for Genera...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12810#discussion_r61991082 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -41,19 +41,17 @@ import org.apache.spark.sql.types

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-216704563 @NarineK repartition() could use a list of column as partition key (we need to update the R API, Scala and Python API already have that), then all the rows having

[GitHub] spark pull request: [SPARK-15095] [SQL] remove HiveSessionHook fro...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12881#issuecomment-216686765 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-216686038 I don't understand what's the sematics of gapply(df, f, "a") here, does it work like `dapply(repartition(df, "a"), f)` ? --- If your project is se

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r61968540 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2083,6 +2083,46 @@ test_that("dapply() on a DataFrame", { expect_identical(expect

[GitHub] spark pull request: [SPARK-15095] [SQL] remove HiveSessionHook fro...

2016-05-03 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/12881 [SPARK-15095] [SQL] remove HiveSessionHook from ThriftServer ## What changes were proposed in this pull request? Remove HiveSessionHook ## How was this patch tested

[GitHub] spark pull request: [SPARK-15102] [SQL] remove delegation token su...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12878#issuecomment-216678094 Oh, my bad.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SQL-15102] [SQL]remove delegation token suppo...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12878#issuecomment-216675454 The ThriftServer is only used for SQL, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SQL-15102] [SQL]remove delegation token suppo...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12878#issuecomment-216667942 @rxin Should we also remove the APIs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SQL-15102] [SQL]remove delegation token suppo...

2016-05-03 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/12878 [SQL-15102] [SQL]remove delegation token support from ThriftServer ## What changes were proposed in this pull request? These API is only useful for Hadoop, may not work for Spark SQL

[GitHub] spark pull request: [SPARK-14521][SQL]StackOverflowError in Kryo w...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12598#issuecomment-216662322 Yes, updated the comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11327#issuecomment-216658381 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14521][SQL]StackOverflowError in Kryo w...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12598#issuecomment-216657833 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15095] [SQL] drop binary mode in Thrift...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12876#issuecomment-216642750 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15095] [SQL] drop binary mode in Thrift...

2016-05-03 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/12876 [SPARK-15095] [SQL] drop binary mode in ThriftServer ## What changes were proposed in this pull request? This PR drop the support for binary mode in ThriftServer, only HTTP mode

[GitHub] spark pull request: [SPARK-9372] [SQL] Filter nulls in Inner joins...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/9451#issuecomment-216623160 @vidma I think this is already fixed in master (having constraints for join and turn constraints into predicate, push down the predicates), do you mind to close this PR

[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11508#issuecomment-216619359 @zuowang We already have a default size (4k) for ObjectType, do you mind close this PR? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12860#discussion_r61929447 --- Diff: python/pyspark/sql/session.py --- @@ -445,6 +452,77 @@ def read(self): """ return DataFrameReade

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12860#discussion_r61929305 --- Diff: python/pyspark/sql/session.py --- @@ -445,6 +452,86 @@ def read(self): """ return DataFrameReade

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12860#discussion_r61929340 --- Diff: python/pyspark/sql/session.py --- @@ -445,6 +452,86 @@ def read(self): """ return DataFrameReade

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12860#discussion_r61928761 --- Diff: python/pyspark/sql/session.py --- @@ -445,6 +452,86 @@ def read(self): """ return DataFrameReade

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11327#issuecomment-216603791 LGTM, pending tests. It's great to have 200X speedup, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-14521][SQL]StackOverflowError in Kryo w...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12598#issuecomment-216592103 LGTM, two minor comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14521][SQL]StackOverflowError in Kryo w...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12598#discussion_r61914539 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -324,8 +348,10 @@ private[joins] object

[GitHub] spark pull request: [SPARK-14521][SQL]StackOverflowError in Kryo w...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12598#discussion_r61914359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -171,36 +174,53 @@ private[joins] class

[GitHub] spark pull request: [SPARK-15088][SQL] Remove SparkSqlSerializer

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12864#issuecomment-216590257 LGTM Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14951][SQL] Support subexpression elimi...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12729#discussion_r61913388 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -572,6 +588,65 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12860#discussion_r61842033 --- Diff: python/pyspark/sql/session.py --- @@ -445,6 +452,77 @@ def read(self): """ return DataFrameReade

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12860#discussion_r61841998 --- Diff: python/pyspark/sql/session.py --- @@ -445,6 +452,77 @@ def read(self): """ return DataFrameReade

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

2016-05-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12860#discussion_r61841944 --- Diff: python/pyspark/sql/session.py --- @@ -445,6 +452,77 @@ def read(self): """ return DataFrameReade

[GitHub] spark pull request: [SPARK-14951][SQL] Support subexpression elimi...

2016-05-03 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12729#issuecomment-216447436 LGTM, only one comment. BTW, maybe the C1/C2 compile also could do remove the duplicated expressions, could you disable this feature and re-ran the benchmark

[GitHub] spark pull request: [SPARK-14951][SQL] Support subexpression elimi...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12729#discussion_r61840224 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -572,6 +588,65 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216396595 LGTM, just one minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61821978 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -246,12 +260,40 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14785][SQL] Support correlated scalar s...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12822#discussion_r61813244 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -109,7 +109,7 @@ case class Filter

[GitHub] spark pull request: [SPARK-14785][SQL] Support correlated scalar s...

2016-05-02 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12822#issuecomment-216379989 LGTM, Will merge this one once it pass the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-14785][SQL] Support correlated scalar s...

2016-05-02 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12822#issuecomment-216376941 @hvanhovell They works well now. Could you also update Filter to not create constraint from predicate that has correlated subquery? --- If your project is set up

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-02 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11327#issuecomment-216358709 @tgravescs Could you also update the description of reflect the new changes? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11327#discussion_r61801165 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -334,8 +332,41 @@ private class DefaultPartitionCoalescer(val balanceSlack

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11327#discussion_r61801132 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -334,8 +332,41 @@ private class DefaultPartitionCoalescer(val balanceSlack

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11327#discussion_r61798932 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -320,7 +317,8 @@ private class DefaultPartitionCoalescer(val balanceSlack

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11327#discussion_r61798850 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -289,10 +284,12 @@ private class DefaultPartitionCoalescer(val balanceSlack

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11327#discussion_r61798109 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -169,43 +169,41 @@ private class DefaultPartitionCoalescer(val balanceSlack

[GitHub] spark pull request: [SPARK-11316] coalesce doesn't handle UnionRDD...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11327#discussion_r61797836 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -169,43 +169,41 @@ private class DefaultPartitionCoalescer(val balanceSlack

<    3   4   5   6   7   8   9   10   11   12   >