[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...

2018-12-05 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/23222 I'm not sure how this ended up being omitted. `TransposeWindowSuite` will be fine since it creates a simple optimizer from this rule and a few others. The new test added

[GitHub] spark pull request #22445: Branch 2.3 udf nullability

2018-09-17 Thread ptkool
Github user ptkool closed the pull request at: https://github.com/apache/spark/pull/22445 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22445: Branch 2.3 udf nullability

2018-09-17 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/22445 Branch 2.3 udf nullability ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-01-25 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 It should throw a proper exception. I will make the necessary code changes. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #18906: [SPARK-21692][PYSPARK][SQL] Add nullability suppo...

2018-01-25 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/18906#discussion_r163828796 --- Diff: python/pyspark/sql/functions.py --- @@ -2105,6 +2105,14 @@ def udf(f=None, returnType=StringType()): >>> impo

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-12-28 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 @HyukjinKwon @holdenk There is still an issue with the use of `SparkSession.udf.register` that needs to be resolved. For instance, the following will not work as expected: ```python

[GitHub] spark issue #18424: [SPARK-17091] Add rule to convert IN predicate to equiva...

2017-12-04 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18424 @a10y Yes, I'm still tracking this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-12-01 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 @HyukjinKwon As requested, here are the related Scala API changes: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 @holdenk I believe the changes in this PR match what's provided in the scala API. Am I missing something? --- - To unsubscribe

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-14 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 Here are the similar changes in the Scala API: https://github.com/apache/spark/pull/17911 --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #19672: [SPARK-22456] Add support for dayofweek function

2017-11-06 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/19672 [SPARK-22456] Add support for dayofweek function ## What changes were proposed in this pull request? This PR adds support for a new function called `dayofweek` that returns the day of the week

[GitHub] spark pull request #19415: Branch 2.2 udf nullability

2017-10-02 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/19415 Branch 2.2 udf nullability ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain

[GitHub] spark pull request #19415: Branch 2.2 udf nullability

2017-10-02 Thread ptkool
Github user ptkool closed the pull request at: https://github.com/apache/spark/pull/19415 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19414: Udf nullablity fixes

2017-10-02 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/19414 Udf nullablity fixes ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how

[GitHub] spark pull request #19414: Udf nullablity fixes

2017-10-02 Thread ptkool
Github user ptkool closed the pull request at: https://github.com/apache/spark/pull/19414 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19209: Branch 2.2 udf nullability

2017-09-12 Thread ptkool
Github user ptkool closed the pull request at: https://github.com/apache/spark/pull/19209 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19209: Branch 2.2 udf nullability

2017-09-12 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/19209 Branch 2.2 udf nullability ## What changes were proposed in this pull request? When registering a Python UDF, a user may know whether the function can return null values or not. PythonUDF

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-30 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 @rxin This PR isn't about performance at all. I realize Python UDFs do not perform well and I also realize annotating Python UDFs with nullability is not going to make any difference

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-23 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 @rxin We have several large systems with 100s of Spark jobs implemented in Python and PySpark, and use Python UDFs due to lack of equivalent functionality in Spark. I understand what your saying re

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-21 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18906 @ueshin Thanks for commenting. It's unfortunate that users find nullability confusing. If you're coming from a SQL world, you should be quite familiar with nullability and null values

[GitHub] spark pull request #18906: [SPARK-21692] Add nullability support to PythonUD...

2017-08-10 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/18906 [SPARK-21692] Add nullability support to PythonUDF. ## What changes were proposed in this pull request? When registering a Python UDF, a user may know whether the function can return null

[GitHub] spark issue #18424: [SPARK-17091] Add rule to convert IN predicate to equiva...

2017-07-05 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18424 @rxin Yes, I have. https://issues.apache.org/jira/browse/SPARK-21218?focusedCommentId=16064608=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16064608 --- If your

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to transp...

2017-06-30 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r125013340 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -610,6 +611,25 @@ object CollapseWindow extends Rule

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to transp...

2017-06-30 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r125013283 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -610,6 +611,25 @@ object CollapseWindow extends Rule

[GitHub] spark issue #18424: [SPARK-21218] Add rule to convert IN predicate to equiva...

2017-06-27 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/18424 @a10y Yes. Please have a look at my comments in https://issues.apache.org/jira/browse/SPARK-21218. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-06-26 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17708 @gatorsmile I will run a few more tests to determine if subexpression elimination solves this issue. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18424: [SPARK-21218] Add rule to convert IN predicate to...

2017-06-26 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/18424 [SPARK-21218] Add rule to convert IN predicate to equivalent Parquet filter. ## What changes were proposed in this pull request? Add a new optimizer rule to convert an IN predicate

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to flip adjacent...

2017-06-03 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17899 @hvanhovell @gatorsmile Can you have another look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117241321 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117240910 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117240492 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -423,4 +423,25 @@ class DataFrameWindowFunctionsSuite

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117238414 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -609,6 +610,19 @@ object CollapseWindow extends Rule

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-08 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/17899 [SPARK-20636] Add new optimization rule to flip adjacent Window expressions. ## What changes were proposed in this pull request? Add new optimization rule to eliminate unnecessary shuffling

[GitHub] spark pull request #17764: Add new function isNotDistinctFrom.

2017-04-25 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/17764 Add new function isNotDistinctFrom. ## What changes were proposed in this pull request? Expose the SPARK SQL <=> operator in PySpark as a column function. ## How was this

[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-24 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17648 @rxin Actually, @hvanhovell proposed the following rewrites which I think are better: ``` some(cond) => max(cond) = true every(cond) => min(cond) = true ``` --- I

[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-24 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17648 @rxin Ok. So you're proposing rewrites for these aggregates that look something like this? ``` some(cond) => sum(cond) > 0 every(cond) => sum(not(cond)) = 0 ``` --

[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-21 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17648 @rxin I'm not sure where you're going with your proposal. These are aggregate functions, not scalar functions. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-04-20 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17708#discussion_r112549462 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -387,6 +387,13 @@ case class

[GitHub] spark pull request #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-04-20 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17708#discussion_r112522049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1007,22 +1006,38 @@ object functions { def map(cols: Column*): Column

[GitHub] spark pull request #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-04-20 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17708#discussion_r112522069 --- Diff: python/pyspark/sql/functions.py --- @@ -466,6 +466,14 @@ def nanvl(col1, col2): return Column(sc._jvm.functions.nanvl(_to_java_column(col1

[GitHub] spark pull request #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-04-20 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17708#discussion_r112521982 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1007,22 +1006,38 @@ object functions { def map(cols: Column*): Column

[GitHub] spark pull request #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-04-20 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17708#discussion_r112521514 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala --- @@ -537,5 +537,10 @@ class PlanParserSuite extends

[GitHub] spark pull request #17708: Add new query hint NO_COLLAPSE.

2017-04-20 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/17708 Add new query hint NO_COLLAPSE. ## What changes were proposed in this pull request? This PR proposes adding a new query hint called NO_COLLAPSE that can be used to prevent adjacent

[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...

2017-04-19 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17650#discussion_r112309456 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala --- @@ -160,4 +166,12 @@ class

[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...

2017-04-17 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17650#discussion_r111733014 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala --- @@ -160,4 +166,12 @@ class

[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...

2017-04-16 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/17650 [SPARK-20350] Add optimization rules to apply Complementation Laws. ## What changes were proposed in this pull request? Apply Complementation Laws during boolean expression simplification

[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-16 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17648 Moved this PR to a feature branch and lost comments. The original PR is here: https://github.com/apache/spark/pull/17194 --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #17648: Every any aggregates

2017-04-16 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/17648 Every any aggregates ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how

[GitHub] spark pull request #17194: [SPARK-19851] Add new aggregates EVERY and ANY (S...

2017-04-16 Thread ptkool
Github user ptkool closed the pull request at: https://github.com/apache/spark/pull/17194 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17194: Add new aggregates EVERY and ANY (SOME).

2017-03-07 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17194 `every` and `any` are also part of the SQL standard, so most SQL users will be familiar with them. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17194: Add new aggregates EVERY and ANY (SOME).

2017-03-07 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17194 I think `every` and `any` are more intuitive, particularly if the operand is a boolean expression. For example, `every(t1.c1 > t2.c2)` vs `max(t1.c1 > t2.c2)`. Also, `every` and `any` return

[GitHub] spark issue #17194: Add new aggregates EVERY and ANY (SOME).

2017-03-07 Thread ptkool
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17194 @hvanhovell I'm not following how `min` and `max` could be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #17194: Add new aggregates EVERY and ANY (SOME).

2017-03-07 Thread ptkool
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/17194 Add new aggregates EVERY and ANY (SOME). ## What changes were proposed in this pull request? This pull request implements the EVERY and ANY aggregates. ## How was this patch tested