[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 I think we can set up a place (mailling list or JIRA) to discuss the further thing about IR design, as suggested by @HyukjinKwon. This can be a co-work from interesting parties

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210799970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -459,6 +460,29 @@ object SQLConf { .intConf

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 If we will continue on improving current codegen framework, I think it is good to have a design doc reviewed by the community. If we decide to have IR design and get rid of this string based

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210785081 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -459,6 +458,29 @@ object SQLConf { .intConf

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210779310 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -459,6 +458,29 @@ object SQLConf { .intConf

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210765335 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -25,17 +25,16 @@ import java.util.zip.Deflater import

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 Thanks for all involving in this discussion! Sorry that I was on a long flight and feels too tired to reply now. I just want to say for now, no matter what decision we made, to continue to

[GitHub] spark issue #22024: [SPARK-25034][CORE] Remove allocations in onBlockFetchSu...

2018-08-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22024 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 Yeah. The encoding rules regarding `Option` are: 1. For non `Product` types, no change is made. ``` Option[Int] in normal encoder -> a nullable int column Option[Int] in

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 @HyukjinKwon Thanks! I've updated. Please let me know if they are good now. --- - To unsubscribe, e-mail: reviews-uns

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-08-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21439 I think R side is not update for this yet. @huaxingao would you like to do that? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21859 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21980 Thanks2! @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209486478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-08-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21439 LGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-08-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21439 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21859 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22076: [SPARK-25090][ML] Enforce implicit type coercion in Para...

2018-08-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22076 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209464591 --- Diff: python/pyspark/worker.py --- @@ -275,6 +280,10 @@ def main(infile, outfile): shuffle.DiskBytesSpilled = 0

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209464569 --- Diff: python/pyspark/worker.py --- @@ -261,6 +263,9 @@ def main(infile, outfile): # initialize global state taskContext

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209464621 --- Diff: python/pyspark/taskcontext.py --- @@ -29,6 +29,7 @@ class TaskContext(object): """ _tas

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209464679 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +183,42 @@ private[spark] abstract class BasePythonRunner[IN

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209464515 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +183,42 @@ private[spark] abstract class BasePythonRunner[IN

[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...

2018-08-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r209464356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22037 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209418749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209418206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209417666 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,9 +169,17 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209417629 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209410871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209407875 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,9 +169,17 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209408395 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21980 Thanks @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209374683 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 Thanks @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22066 > @viirya is that effort going on? I can help with the work if you want. Thanks. @mgaido91 Yeah, I'm still working on it. One of the PRs #21537 is still waiting fo

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 Thank you! @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22019 SGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21732: [SPARK-24762][SQL] Enable Option of Product encod...

2018-08-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21732#discussion_r209144900 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -43,20 +43,17 @@ import

[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...

2018-08-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21847 Few minor comments. LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-08-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r209116303 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -725,6 +744,205 @@ class AvroSuite extends QueryTest with

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-08-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r209115909 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -87,10 +87,36 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-08-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r209115631 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -87,10 +87,36 @@ class AvroSerializer(rootCatalystType

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...

2018-08-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22037#discussion_r209067476 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -479,6 +481,26 @@ object Decimal { dec

[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...

2018-08-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22037#discussion_r209083275 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala --- @@ -139,7 +152,22 @@ object SchemaConverters

[GitHub] spark issue #21826: [SPARK-24872] Replace the symbol '||' of Or operator wit...

2018-08-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21826 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21535: [SPARK-23596][SQL] Test interpreted path on encoders tes...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21535 Thanks @HyukjinKwon @hvanhovell @mgaido91 @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #22018: [SPARK-25038][SQL] Accelerate Spark Plan generati...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22018#discussion_r208784598 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -297,7 +297,7 @@ object InMemoryFileIndex

[GitHub] spark pull request #22018: [SPARK-25038][SQL] Accelerate Spark Plan generati...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22018#discussion_r208779110 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -297,7 +297,7 @@ object InMemoryFileIndex

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21859 This optimization is only for SQL, but other places also use `RangePartitioner`. What it can affect other places? --- - To

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208775191 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208775694 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208774474 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2799,6 +2799,26 @@ class SQLQuerySuite extends QueryTest with

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208775584 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,7 +169,16 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208774179 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2799,6 +2799,26 @@ class SQLQuerySuite extends QueryTest with

[GitHub] spark pull request #22046: [JUST_TEST][NOT_MERGE] Test for VersionsSuite

2018-08-08 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/22046 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208770924 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,7 +169,16 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark issue #22046: [JUST_TEST][NOT_MERGE] Test for VersionsSuite

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22046 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 I set up a test PR for `VersionsSuite` at #22046. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #22046: [JUST_TEST][NOT_MERGE] Test for VersionsSuite

2018-08-08 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22046 [JUST_TEST][NOT_MERGE] Test for VersionsSuite ## What changes were proposed in this pull request? Few test cases in `VersionsSuite` continue to fail in my one PR. But it isn't reprodu

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 I don't run into this test failure in `VersionsSuite` locally. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apach

[GitHub] spark issue #22041: [SPARK-25058][SQL] Use Block.isEmpty/nonEmpty to check w...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22041 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 Looks like unrelated test failure at `VersionsSuite`... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21520: [SPARK-24505][SQL] Forbidding string interpolation in Co...

2018-08-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21520 @HyukjinKwon Thanks for looking into this. It is based on the comment and discussion here https://github.com/apache/spark/pull/21193#discussion_r186627099

[GitHub] spark issue #22027: [SPARK-25010][SQL][FOLLOWUP] Shuffle should also produce...

2018-08-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22027 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22008: [SPARK-24928][SQL] Optimize cross join according ...

2018-08-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22008#discussion_r208385422 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,45 @@ object EliminateOuterJoin extends Rule

[GitHub] spark pull request #22008: [SPARK-24928][SQL] Optimize cross join according ...

2018-08-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22008#discussion_r208353576 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -158,8 +158,9 @@ abstract class Optimizer

[GitHub] spark pull request #22008: [SPARK-24928][SQL] Optimize cross join according ...

2018-08-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22008#discussion_r208352309 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,45 @@ object EliminateOuterJoin extends Rule

[GitHub] spark issue #22027: [SPARK-25010][SQL][FOLLOWUP] Shuffle should also produce...

2018-08-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22027 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...

2018-08-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22019 Empty string should be treated as null for all non string types? https://github.com/apache/spark/blob/ef57fdd5b0a6f7f0b6343c91c6983d20bc67fb5b/sql/catalyst/src/main/scala/org/apache/spark/sql

[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21991 The failed test is `FlatMapGroupsWithStateSuite.flatMapGroupsWithState`. I saw it fails some times occasionally. I think it should not be related to this change. @HyukjinKwon @dbtsai

[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-06 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21980#discussion_r208078032 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -854,6 +854,26 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `S...

2018-08-06 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21991#discussion_r208055413 --- Diff: dev/merge_spark_pr.py --- @@ -142,6 +142,9 @@ def merge_pr(pr_num, target_ref, title, body, pr_repo_desc): distinct_authors[0

[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-06 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21980#discussion_r207971182 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala --- @@ -75,14 +74,11 @@ class IncrementalExecution

[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-05 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21984#discussion_r207746389 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala --- @@ -100,6 +103,8 @@ class AvroDeserializer(rootAvroType: Schema

[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21948 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207725721 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -33,7 +33,10 @@ public interface

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21980 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @hvanhovell Shall we consider to include this into 2.4? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21988: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-08-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21988 And put `[BRANCH-2.3]` into the title of the PR #21989 too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21952 The regression happens at writing. Looks like when benchmarking writing time, we don't use `df.count`? --- - To unsubscri

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21952 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21952 Ah, finally I can reproduce this. It needs to allocate the array feature with length 16000. I was reducing it to 1600 and it largely relieve the regression. `com.databricks.spark.avro` is faster

[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21980#discussion_r207442765 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala --- @@ -75,14 +75,15 @@ class IncrementalExecution

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207409267 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -333,7 +340,7 @@ private[spark] class Client( val

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207409771 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-02 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/21980 [SPARK-25010][SQL] Rand/Randn should produce different values for each execution in streaming query ## What changes were proposed in this pull request? Like Uuid in SPARK-24896, Rand and

[GitHub] spark issue #21952: [SPARK-24993] [SQL] [WIP] Make Avro Fast Again

2018-08-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21952 @dbtsai This is what I see when testing on Spark 2.3. Compared with above numbers, seems to me there are no such significant difference as same as your findings. ```scala

<    4   5   6   7   8   9   10   11   12   13   >