[GitHub] spark issue #19318: [SPARK-22096][ML] use aggregateByKeyLocally in feature f...

2017-09-28 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/19318 @VinceShieh can you please mark this PR's title as "[WIP]"? --- - To unsubscribe, e-mail: review

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2017-06-14 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13585 Oh, yes, I am closing it, will reopen it when we have another idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2017-06-14 Thread chenghao-intel
Github user chenghao-intel closed the pull request at: https://github.com/apache/spark/pull/13585 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...

2017-06-04 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/17936 I can understand any code change in Spark core will be hard to review due to the regression concern, I think we can leave the PR for discussion. 1) Actually the `UnsafeCartesianRDD

[GitHub] spark issue #17359: [SPARK-20028][SQL] Add aggreagate expression nGrams

2017-03-24 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/17359 @rxin nGram is the built-in UDAF in Hive, and some users complaints they faced performance issue when running the queries with nGram. --- If your project is set up for it, you can reply

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16245#discussion_r97211716 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -514,6 +514,34 @@ case class OptimizeCodegen

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/16245 I think that's true in most of time for`Scala UDF needs extra conversion between internal format and external format on input and out`, not all of the time, for example, some built-in string

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16245#discussion_r97211489 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -514,6 +514,34 @@ case class OptimizeCodegen

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/16245 Actually I doubt this is really an optimization, as the assumption of Scala UDF is slower than the non-SCALA UDF probably not always true. --- If your project is set up for it, you can

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16245#discussion_r97211330 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -514,6 +514,34 @@ case class OptimizeCodegen

[GitHub] spark issue #16476: [SPARK-19084][SQL][WIP] Implement expression field

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/16476 Since the different data type will be simply ignored, I think we'd better also add the optimization rule in `Optimizer`. As well as the python/scala API support, but need to confirm

[GitHub] spark pull request #16476: [SPARK-19084][SQL][WIP] Implement expression fiel...

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95283107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1528,6 +1528,18 @@ object functions { def factorial(e: Column

[GitHub] spark pull request #16476: [SPARK-19084][SQL][WIP] Implement expression fiel...

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95282681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,102 @@ object

[GitHub] spark pull request #16476: [SPARK-19084][SQL][WIP] Implement expression fiel...

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95282465 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,102 @@ object

[GitHub] spark pull request #16476: [SPARK-19084][SQL][WIP] Implement expression fiel...

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95282270 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,102 @@ object

[GitHub] spark pull request #16476: [SPARK-19084][SQL][WIP] Implement expression fiel...

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95281248 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,102 @@ object

[GitHub] spark pull request #16476: [SPARK-19084][SQL][WIP] Implement expression fiel...

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95281159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,102 @@ object

[GitHub] spark pull request #16476: [SPARK-19084][SQL][WIP] Implement expression fiel...

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95281046 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,102 @@ object

[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2017-01-09 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/16476 @gczsjdy can you please add [WIP] in the title, until you feel the code is ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-08 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95080582 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object

[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-24 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/15579 Oh, thank you @jerryshao , just noticed you gave inputs also. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-24 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/15579 @srowen Besides `numactl`, some profiling tools like the `valgrind`, `strace`, `vtune`, and also the system call hackings we probably needed before the executor process launched

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-16 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/15361 yes, please go ahead. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #10225: [SPARK-12196][Core] Store/retrieve blocks from di...

2016-09-06 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/10225#discussion_r77748327 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -136,7 +136,9 @@ private[spark] class

[GitHub] spark issue #14366: [SPARK-16732][SQL] Remove unused codes in subexpressionE...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/14366 Ping @rxin , seems the upstream is not updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/12646 I like this PR since it's part of SQL standard, but there are also another Jira, https://issues.apache.org/jira/browse/SPARK-17299 , maybe we can do that in a follow up PR to fix. Can you

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76966164 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2677,4 +2678,107 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76966028 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2677,4 +2678,107 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76965552 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -476,6 +476,61 @@ public UTF8String trim

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76965110 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1789,6 +1803,133 @@ class SQLQuerySuite extends

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76963822 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -431,56 +432,233 @@ case class

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76963598 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -431,56 +432,233 @@ case class

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76963573 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -431,56 +432,233 @@ case class

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76963406 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -431,56 +432,233 @@ case class

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76962244 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -501,6 +578,38 @@ public UTF8String trimRight

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76962088 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -488,6 +543,28 @@ public UTF8String trimLeft

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76961869 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -488,6 +543,28 @@ public UTF8String trimLeft

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2016-08-31 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r76961503 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -488,6 +543,28 @@ public UTF8String trimLeft

[GitHub] spark issue #14481: [WIP][SPARK-16844][SQL] Generate code for sort based agg...

2016-08-17 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/14481 @yucai can you please rebase the code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-07-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r72184495 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-07-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r72184424 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends

[GitHub] spark pull request #14169: [SPARK-16515][SQL]set default record reader and w...

2016-07-17 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/14169#discussion_r71085323 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1329,7 +1329,7 @@ class SparkSqlAstBuilder(conf

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/14169 HiveConf provides default value `org.apache.hadoop.hive.ql.exec.TextRecordReader`, `org.apache.hadoop.hive.ql.exec.TextRecordWriter` for keys `hive.script.recordreader

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/14169 LGTM. cc @yhuai @liancheng This breaks the existed application which using the default delimiter, and we've already verified in TPCx-BB. --- If your project is set up

[GitHub] spark issue #13542: [SPARK-15730][SQL] Respect the --hiveconf in the spark-s...

2016-06-30 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13542 @yhuai I couldn't find any piece of code to copy the `HiveConf`(from SessionState) to `SqlConf`? Can you confirm this? Probably that's the reason why --hiveconf doesn't work

[GitHub] spark issue #13542: [SPARK-15730][SQL] Respect the --hiveconf in the spark-s...

2016-06-14 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13542 Thanks @jameszhouyi , I've removed the `WIP` from the title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13542: [SPARK-15730][SQL] Respect the --hiveconf in the ...

2016-06-14 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13542#discussion_r67083155 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -91,6 +91,8 @@ class CliSuite extends

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-13 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66743318 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-13 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66743131 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-13 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66742892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66741771 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66733226 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13585 Updated with more meaningful function name and add more unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13585 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in...

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13542#discussion_r66731077 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -91,6 +91,8 @@ class CliSuite extends

[GitHub] spark issue #13530: [SPARK-14279][BUILD] Pick the spark version from pom

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13530 `spark-version-info.properties` cannot be found in my develop machine, and will cause NPE while debugging with IDE, should we add the default version info back for developers like me

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66714698 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66714358 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66714324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66714314 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +69,95 @@ class QueryPartitionSuite extends

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-11 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13585 Thank you all for the review, but I am not going to solve the CNF, the intention of this PR is to exact more partition pruning expression, so we will get have less partition to scan during

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66714297 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-09 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/13585 [SPARK-15859][SQL] Optimize the partition pruning within the disjunction ## What changes were proposed in this pull request? In disjunction, the partition pruning expression can simply

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-09 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13542 Currently, the SparkSQL cli will ignore the configuration passed from commandline via `--hiveconf`, this will break lots of existing application, it's not by design, isn't it? @yhuai @rxin

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-09 Thread chenghao-intel
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13542 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in...

2016-06-07 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/13542 [SPARK-15730][SQL][WIP] Respect the --hiveconf in the spark-sql command line ## What changes were proposed in this pull request? We should respect the --hiveconf in the spark-sql

[GitHub] spark pull request: [SPARK-15480][UI][Streaming]show missed InputI...

2016-05-23 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/13259#issuecomment-221166631 cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...

2016-04-14 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/12391#issuecomment-209937183 LGTM except some minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...

2016-04-14 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12391#discussion_r59711889 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala --- @@ -46,4 +48,23 @@ class HiveExternalCatalogSuite

[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...

2016-04-14 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12391#discussion_r59711729 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala --- @@ -17,20 +17,22 @@ package

[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...

2016-04-14 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/12391#discussion_r59711674 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala --- @@ -17,20 +17,22 @@ package

[GitHub] spark pull request: [SPARK-12610][SQL] Add Anti join operators

2016-04-06 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/10563#issuecomment-206693926 Close this PR due to it's merged in #12214 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-12610][SQL] Add Anti join operators

2016-04-06 Thread chenghao-intel
Github user chenghao-intel closed the pull request at: https://github.com/apache/spark/pull/10563 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-12196][Core] Store/retrieve blocks in d...

2016-04-05 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/10225#issuecomment-205712538 @JoshRosen I am not sure if this still be part of your refactorings, or can we bring up this PR? This PR is quite critical performance improvement when mixed

[GitHub] spark pull request: [SPARK-14021][SQL] custom context support for ...

2016-03-28 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/11843#issuecomment-202395548 cc @yhuai , this is critical for our own customized `HiveContext`, can you please merge this? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-14021][SQL] custom context support for ...

2016-03-24 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/11843#issuecomment-200900425 cc @rxin @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14021][SQL][WIP] custom context support...

2016-03-23 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11843#discussion_r57120180 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala --- @@ -34,6 +34,20 @@ private[hive] object

[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/11713#issuecomment-197131091 cc @rxin @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-13895][SQL]Change the return type of Da...

2016-03-15 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/11731 [SPARK-13895][SQL]Change the return type of DataFrameReader.text ## What changes were proposed in this pull request? Change the return type of `DataFrameReader.text` from `DataFrame

[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/11713#issuecomment-196816685 BTW, @carsonwang can you also describe without this change, what would happen to those applications with dynamic allocation enabled? This will helps people

[GitHub] spark pull request: [SPARK-13894][SQL] SqlContext.range return typ...

2016-03-15 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/11730 [SPARK-13894][SQL] SqlContext.range return type from DataFrame to DataSet ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13894 Change

[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11713#discussion_r56130697 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -73,7 +73,8 @@ private[spark] class ApplicationMaster

[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-189470296 LGTM except some minor suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r54296531 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -89,4 +89,25 @@ class HiveTableScanSuite

[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r54296448 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r54296499 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r54170956 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/AccumulableCheckpoint.scala --- @@ -0,0 +1,37 @@ +/* + * Licensed

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r54170867 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -269,6 +270,33 @@ class StreamingContext private

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r54170612 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -269,6 +270,33 @@ class StreamingContext private

[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

2016-02-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11101#discussion_r54167619 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala --- @@ -123,6 +126,15 @@ class JobGenerator(jobScheduler

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-24 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53925855 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -41,6 +41,13 @@ class Checkpoint(ssc: StreamingContext, val

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53419256 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -269,6 +270,39 @@ class StreamingContext private

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53419167 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -41,6 +41,13 @@ class Checkpoint(ssc: StreamingContext, val

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53419010 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -41,6 +41,13 @@ class Checkpoint(ssc: StreamingContext, val

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53417873 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -269,6 +270,39 @@ class StreamingContext private

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53417573 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -41,6 +41,13 @@ class Checkpoint(ssc: StreamingContext, val

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53417343 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -41,6 +41,13 @@ class Checkpoint(ssc: StreamingContext, val

[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11101#discussion_r53417276 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala --- @@ -123,6 +126,12 @@ class JobGenerator(jobScheduler

[GitHub] spark pull request: [Spark-13374][Streaming][wip] make it possible...

2016-02-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11249#discussion_r53400290 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -269,6 +270,39 @@ class StreamingContext private

  1   2   3   4   5   6   7   8   9   10   >